WO2024082108A1 - Method for encoding and decoding a 3d point cloud, encoder, decoder - Google Patents

Method for encoding and decoding a 3d point cloud, encoder, decoder Download PDF

Info

Publication number
WO2024082108A1
WO2024082108A1 PCT/CN2022/125764 CN2022125764W WO2024082108A1 WO 2024082108 A1 WO2024082108 A1 WO 2024082108A1 CN 2022125764 W CN2022125764 W CN 2022125764W WO 2024082108 A1 WO2024082108 A1 WO 2024082108A1
Authority
WO
WIPO (PCT)
Prior art keywords
point cloud
cuboid
bitstream
triangles
normal vector
Prior art date
Application number
PCT/CN2022/125764
Other languages
French (fr)
Inventor
Shuo Gao
Original Assignee
Beijing Xiaomi Mobile Software Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co., Ltd. filed Critical Beijing Xiaomi Mobile Software Co., Ltd.
Priority to PCT/CN2022/125764 priority Critical patent/WO2024082108A1/en
Publication of WO2024082108A1 publication Critical patent/WO2024082108A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/004Predictors, e.g. intraframe, interframe coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/40Tree coding, e.g. quadtree, octree
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding

Definitions

  • the present invention relates to a method for decoding a 3D point cloud from a bitstream. Additionally, it is an object of the present invention to provide a method for encoding a 3D point cloud into a bitstream. Further, it is an object of the present invention to provide an encoder and decoder, a bitstream encoded according to the present invention and a software. In particular, it is an object of the present invention to provide a method with increased accuracy of the decoding or reconstruction process of a 3D point cloud.
  • point clouds As a format for the representation of 3D data, point clouds have recently gained traction as they are versatile in their capability in representing all types of 3D objects or scenes. Therefore, many use cases can be addressed by point clouds, among which are
  • a point cloud is a set of points located in a 3D space, optionally with additional values attached to each of the points. These additional values are usually called point attributes. Consequently, a point cloud is a combination of a geometry (the 3D position of each point) and attributes.
  • Attributes may be, for example, three-component colours, material properties like reflectance and/or two-component normal vectors to a surface associated with the point.
  • Point clouds may be captured by various types of devices like an array of cameras, depth sensors, Lidars, scanners, or may be computer-generated (in movie post-production for example) . Depending on the use cases, points clouds may have from thousands to up to billions of points for cartography applications.
  • Raw representations of point clouds require a very high number of bits per point, with at least a dozen of bits per spatial component X, Y or Z, and optionally more bits for the attribute (s) , for instance three times 10 bits for the colours.
  • Practical deployment of point-cloud-based applications requires compression technologies that enable the storage and distribution of point clouds with reasonable storage and transmission infrastructures.
  • Compression may be lossy (like in video compression) for the distribution to and visualization by an end-user, for example on AR/VR glasses or any other 3D-capable device.
  • Other use cases do require lossless compression, like medical applications or autonomous driving, to avoid altering the results of a decision obtained from the analysis of the compressed and transmitted point cloud.
  • point cloud compression (aka PCC) was not addressed by the mass market and no standardized point cloud codec was available.
  • PCC point cloud compression
  • MPEG Moving Picture Experts Group
  • V-PCC and G-PCC standards have finalized their first version in late 2020.
  • the V-PCC coding method compresses a point cloud by performing multiple projections of a 3D object to obtain 2D patches that are packed into an image (or a video when dealing with moving point clouds) . Obtained images or videos are then compressed using already existing image/video codecs, allowing for the leverage of already deployed image and video solutions.
  • V-PCC is efficient only on dense and continuous point clouds because image/video codecs are unable to compress non-smooth patches as would be obtained from the projection of, for example, Lidar-acquired sparse geometry data.
  • the G-PCC coding method has two schemes for the compression of the geometry.
  • the first scheme is based on an occupancy tree (octree/quadtree/binary tree) representation of the point cloud geometry. Occupied nodes are split down until a certain size is reached, and occupied leaf nodes provide the location of points, typically at the centre of these nodes.
  • neighbour-based prediction techniques high level of compression can be obtained for dense point clouds. Sparse point clouds are also addressed by directly coding the position of point within a node with non-minimal size, by stopping the tree construction when only isolated points are present in a node; this technique is known as Direct Coding Mode (DCM) .
  • DCM Direct Coding Mode
  • the second scheme is based on a predictive tree, each node representing the 3D location of one point and the relation between nodes is spatial prediction from parent to children.
  • This method can only address sparse point clouds and offers the advantage of lower latency and simpler decoding than the occupancy tree.
  • compression performance is only marginally better, and the encoding is complex, relatively to the first occupancy-based method, intensively looking for the best predictor (among a long list of potential predictors) when constructing the predictive tree.
  • attribute (de) coding is performed after complete geometry (de) coding, leading to a two-pass coding.
  • low latency is obtained by using slices that decompose the 3D space into sub-volumes that are coded independently, without prediction between the sub-volumes. This may heavily impact the compression performance when many slices are used.
  • AR/VR point clouds An important use case is the transmission of dynamic AR/VR point clouds. Dynamic means that the point cloud evolves with respect to time. Also, AR/VR point clouds are typically locally 2D as they most of time represent the surface of an object. As such, AR/VR point clouds are highly connected (or said to be dense) in the sense that a point is rarely isolated and, instead, has many neighbours.
  • Dense (or solid) point clouds represent continuous surfaces with a resolution such that volumes (small cubes called voxels) associated with points touch each other without exhibiting any visual hole in the surface.
  • Such point clouds are typically used in AR/VR environments and are viewed by the end user through a device like a TV, a smartphone or a headset. They are transmitted to the device or stored locally.
  • Many AR/VR applications use moving point clouds, as opposed to static point clouds, that vary with time. Therefore, the volume of data is huge and must be compressed.
  • lossless compression based on an octree representation of the geometry of the point cloud can achieve down to slightly less than a bit per point (1 bpp) . This may not be sufficient for real time transmission that may involve several millions of points per frame with a frame rate as high as 50 frames per second (fps) , thus leading to hundreds of megabits of data per second.
  • lossy compression may be used with the usual requirement of maintaining an acceptable visual quality while compressing sufficiently to fit within the bandwidth provided by the transmission channel while maintaining real time transmission of the frames.
  • bitrates as low as 0.1 bpp (10x more compressed than lossless coding) would already make possible real time transmission.
  • the codec VPCC based on MPEG-I part 5 (ISO/IEC 23090-5) or Video-based Point Cloud Compression (V-PCC) can achieve such low bitrates by using lossy compression of video codecs that compress 2D frames obtained from the projection of the point cloud on a plane.
  • the geometry is represented by a series of projection patches assembled into a frame, each patch being a small local depth map.
  • VPCC is not versatile and is limited to a narrow type of point clouds that do not exhibit locally complex geometry (like trees, hair) because the obtained projected depth map would not be smooth enough to be efficiently compressed by a video codec.
  • 3D compression techniques can handle any type of point clouds. It is still an open question whether 3D compression techniques can compete with VPCC (or any projection + image coding scheme) on dense point clouds. Standardization is still under its way toward offering an extension (an amendment) of GPCC that would provide competitive lossy compression that would compress dense point clouds as good as VPCC intra while maintaining the versatility of GPCC that can handle any type of point clouds (dense, Lidar, 3D maps) . This extension is likely to use the so-called TriSoup coding scheme that works over to an octree, as detailed in next sections. TriSoup is under exploration in the standardization working group JTC1/SC29/WG7 of ISO/IEC.
  • the problem is solved by a method for decoding according to claim 1, a method for encoding according to claim 2, an encoder according to claim 6, a decoder according to claim 7, a bitstream according to claim 8 and a software according to claim 9.
  • a method for decoding geometry of a 3D point cloud from a bitstream is provided preferably implemented in a decoder.
  • the method includes:
  • bitstream contains octree information including information about octree structure of the volume of the point cloud and vertex information including information about vertex presence and position of a vertex on edges of cuboids of leaf nodes of the octree structure;
  • bitstream contains information regarding the octree structure of the volume of the point cloud which are decoded.
  • the geometry of the Point cloud is GPCC-encoded.
  • the bitstream also includes vertex information including information about vertex presence and position of a vertex on edges of the cuboids relating to leaf nodes in the octree structure.
  • the vertex information is provided by decoding from the bitstream.
  • the bitstream is preferably encoded by a TriSoup encoding scheme at the encoder.
  • triangles are determined for each cuboid.
  • a virtual position is first determined by averaging the positions of vertices of one cuboid.
  • a first set of triangles are constructed by connecting clockwise two consecutive vertices on the edges of a cuboid and the determined virtual position.
  • the surfaces of the first set triangles are determined by the position of the vertices included in the bitstream and the determined virtual position (i.e., mean position of the vertices) .
  • a normal vector of the cuboid is determined based on the constructed triangle.
  • a normal vector is a vector that is perpendicular to a given object.
  • the normal vector of each of the first set of triangles can be determined respectively.
  • the normal vector of the cuboid may be determined based on the sum of the determined normal vectors of the first set of the triangles.
  • the normal vectors of the first set of the triangles are weighted.
  • the weight is determined based on the areas of the first set of triangles. Triangles with larger area are more likely to contain more points of the point cloud and are thus are given a larger weight.
  • centroid position is determined based on the virtual position and the weighted normal vector of the cuboid. With such adjustment, the determined centroid position might be closer to the original positions of the point cloud in the leaf node.
  • the triangles are constructed according to the vertices and the centroid in a similar manner as previously described.
  • voxelization is performed by a ray-tracing process wherein in the ray-tracing process rays are launched along the three directions parallel to any of the three axes. Their origin is a point of integer coordinates corresponding to the sampling precision wanted for the rendering.
  • the intersection point (if any) of the ray with one of the reconstructed triangles is then determined and added to the list of rendered points, i.e., added to the points of the point cloud.
  • the surface of the reconstructed triangles is sampled by the rays during voxelization in order to determine the points of point cloud.
  • the bitstream further contains an additional value ⁇ and the centroid position is further determined based on the additional value ⁇ .
  • the additional value is determined by the encoder and might be used to calculate the centroid position.
  • the additional value ⁇ is determined based on the position of all the points of the point cloud belonging to the current leaf node.
  • the additional value might be determined by the encoding by considering all the points of the point could belonging to the current leaf node.
  • only points that located within a pre- determined distance from the line is considered. With the additional value, the accuracy of the centroid point might be further improved.
  • a method for encoding a 3D point cloud into a bitstream is provided preferably implemented in an encoder.
  • the method for encoding the 3D point cloud includes:
  • obtaining octree information including an octree structure of a volume including a plurality of cuboids
  • vertex information from surfaces of the point cloud for each cuboid relating to a leaf node, wherein the vertex information includes information about vertex presence and position of a vertex on edges of the cuboid;
  • the octree information as well as the vertex information are generated.
  • This information is encoded into the bitstream.
  • a reconstruction step is performed.
  • the point cloud geometry information is reconstructed, wherein the steps of reconstructing are the same as that in the method for decoding as described above.
  • the reconstructed geometry of point cloud at the encoder side is then used to encode attributes (color, reflectance%) of the points of the point cloud for example by RAHT (Region-Adaptive Hierarchical Transform) , predicting transform or lifting transform being used in order to encode the attributes of the points of the point cloud.
  • RAHT Registered-Adaptive Hierarchical Transform
  • geometry of the point cloud is encoded into the bitstream by Geometry-based Point Cloud Compression (G-PCC) .
  • G-PCC Geometry-based Point Cloud Compression
  • the bitstream is an MPEG G-PCC compliant bitstream.
  • the method for encoding is further built according to the features described before in connection with the method for decoding.
  • an encoder for encoding a 3D point cloud into a bitstream.
  • the encoder comprises a memory and a processor, wherein instructions are stored in the memory, which when executed by the processor perform the steps of the method for encoding described before.
  • a decoder for decoding a 3D point cloud from a bitstream.
  • the decoder comprises a memory and a processor, wherein instructions are stored in the memory, which when executed by the processor perform the steps of the method for decoding de-scribed before.
  • bitstream is provided, wherein the bitstream is encoded by the steps of the method for encoding described before.
  • a computer-readable storage medium comprising instructions to perform the steps of the method for encoding a 3D point cloud into a bitstream as described above.
  • a computer-readable storage medium comprising instructions to perform the steps of the method for decoding a 3D point cloud from a bitstream as described above.
  • Figure 1 a flow diagram of the method for decoding a 3D point cloud geometry according to the present invention
  • Figure 2 an example for generation of the octree structure
  • Figure 4 an example for determining vertices on edges of a cuboid
  • Figure 5 an example for generating triangles
  • Figure 6 an example of vertices on the edges of a cuboid
  • Figure 8 an example of determining the order of the triangles according to Figure 7,
  • Figure 9 a schematic drawing for the step of voxelization
  • Figure 10 an example of reconstruction of triangles using the centroid point C as a pivoting point
  • Figure 11 an example of normal vector
  • Figure 12 an example of a 1D residual along the normal vector
  • Figure 13 an example of normal vector determination process
  • Figure 14 an example of calculating the area of a parallelogram.
  • FIG. 1 showing a schematic diagram for the method of decoding geometry information of a 3D point cloud from a bitstream.
  • the method for decoding geometry of a 3D point cloud from a bitstream includes the steps:
  • bitstream contains octree information including information about octree structure of the volume of the point cloud and vertex information including information about vertex presence and position of a vertex on edges of cuboids of leaf nodes of the octree structure;
  • the first step of the geometry encoding process in order to determine the octree information is to build and encode an octree, as illustrated in Figures 2 and 3.
  • the bounding box is the main volume 100 that contains all the points, and is associated to the root node 112 (i.e. single node at the top of the tree 110) .
  • This volume 100 is first divided into 8 sub-volumes 102 called octants, each is represented by a node 114 in the tree 110.
  • the octants 106 that are occupied by at least one point, which are shaded in Figures 2 and 3, are then recursively split in sub-volumes 104 until a target level is reached.
  • Each octant (or node) is represented by an occupancy byte that contains one bit per child octant, set to one if it is occupied by at least one point, or to zero otherwise.
  • the occupancy bytes 118 of all the octants are serialized (in breadth-first order) and entropy coded with a binary arithmetic encoder.
  • Figure 4 illustrates a blocking representation of a 3D surface 210, as well as an example of a block 220 in a TriSoup.
  • the surface 210 intersects the block 220, which is therefore an occupied block, and the block 220 exists among multiple blocks 200 in 3D space.
  • the enclosed portion of the surface 210 intersects the edges of the block at six illustrated vertices of a polygon 230.
  • An edge of the block 220 is said to be selected if it contains a vertex.
  • Figure 5 illustrates the block 220 in the TriSoup, omitting the surface 210 for clarity, and showing a non-selected edge 270, a selected edge 260, and the ith edge 250.
  • the ith edge 250 is selected.
  • a vertex vi on edge i one specifies a scalar value to indicate a corresponding fraction of the length of the edge 250.
  • the trisoup represents the original surface 210 as a set of triangles 245.
  • This surface is encoded and used to obtain the positions of the recon-structed (or decoded) points.
  • the intersections of the surface represented by the original points with the edges of the octants are estimated by averaging the positions of the points that are the closest to those edges within the octant.
  • the twelve edges of all the octants and their associated intersections (if any) are stored as segments and vertices respectively.
  • Each (unique) segment is then encoded as follows. A first single bit is arithmetically coded, set to one if the segment is occupied by a vertex and zero otherwise. If it is occupied, the relative position of the vertex on the segment is also arithmetically coded.
  • Vertices 310 of triangles are coded along the edges 320 of volumes associated with leaf nodes 300 of the tree, as depicted on Figure 6. These vertices 310 on edge 320 are shared among leaf nodes 300 having a common edge 320. This means that at most one vertex is coded per edge belonging to at least one leaf node. By doing so, continuity of the model is ensured through leaf nodes.
  • the coded data consists in the octree data plus the TriSoup data.
  • the vertex flag is coded by an adaptive binary arithmetic coder that uses one specific context for coding vertex flags.
  • triangles are constructed from the TriSoup vertices if at least three vertices 310 are present on the edges 320 of the leaf node 300. Reconstructed triangles 330, 340 are depicted in Figure 7.
  • triangles 330, 340 are possible.
  • the choice of triangles comes from a three-step process
  • Figure 8 will be used to explain this process. Each of the three axis is tested and the one maximizing the total surfaces of triangle is kept as dominant axis. For simplicity of the figure, only the test over two axis is depicted on Figure 8.
  • a first test (top) along the vertical axis is performed by projecting the cube and the Trisoup vertices 310 vertically on a 2D plane.
  • the vertices 310 are then ordered following a clockwise order relative to the center of the projected node (asquare) .
  • triangles 330, 340 are constructed following a fixed rule based on the ordered vertices.
  • triangles 123 and 134 are constructed systematically when 4 vertices are involved. When 3 vertices are present, the only possible triangle is 123. When 5 vertices are present, a fixed rule may be to construct triangles 123, 134 and 451. And so on, up to 12 vertices.
  • a second test (left) along a horizontal vertical axis is performed by projecting the cube and the Trisoup vertices horizontally on a 2D plane.
  • the vertical projection exhibits the 2D total surface of triangles that is maximum, thus the dominant axis is selected as vertical, and the constructed TriSoup triangles are obtained from the order of the vertical projection, as in Figure 8 inside the node. It is to be noted that taking the horizontal axis as dominant would have led to another construction of triangles.
  • TriSoup triangles into points is performed by ray tracing.
  • the set of all rendered points by ray tracing will make the decoded point cloud.
  • rays are launched along the three directions parallel to an axis. Their origin is a point of integer (voxelized) coordinates of precision corresponding to the sampling precision wanted for the rendering.
  • Trisoup After applying Trisoup to all leaf nodes, i.e., constructing triangles and obtaining points by ray tracing, copies of same points in the list of all rendered points are discarded (i.e. only one voxel is kept among all voxels sharing the same position and volume) to obtain a set of decoded (unique) points.
  • the vertices are ordered clockwise and is not important which of the vertices is chosen as V 1 .
  • This construction preserves natural symmetries of the model without privileging arbitrary some triangles. Moreover, it provides an additional degree of freedom to improve the accuracy of the model, namely the position of the centroid point C.
  • the position of the centroid point C might be further improved by coding a residual position in the bitstream such that the position centroid point C is closer from original points of the point cloud.
  • C C mean + C res
  • C mean is the mean position obtained by averaging coordinates of all (ordered) vertices
  • C res is a coded residual
  • the coded residual may be a 3D residual.
  • a 3D residual is rarely advantageous because it requires many bits to be coded and this many bits are not fully compensated by the better accuracy of the model. Therefore, it is preferred to code a 1D residual C res .
  • a normal vector may be constructed, as shown in Figure 11, and the residual may be determined by the following equation:
  • is a 1D signed scalar value coded in the bitstream, see Figure 12.
  • the normal vector may be derived by the following two steps
  • is the cross product (also named vector (cross) product) between two vectors, and the edges are:
  • the vector may be taken parallel to an axis in order to simplify computation of its determination and the value ⁇ .
  • the vector may be taken parallel to the dominant axis as a good approximation of the vector computed above.
  • the value ⁇ is determined by the encoder, encoded into the bitstream and obtained by the decoder by decoding the bitstream.
  • the value ⁇ may be binarized and each bit may be encoded by using a binary entropy coder such as an arithmetic coder or a context adaptive binary coder like CABAC.
  • the value ⁇ may be binarized into
  • the value ⁇ may be determined by the encoder by considering all points P k of the point cloud belonging to a current leaf node. For each point P k , its distance d k from the line is found by
  • th e.g. 2
  • the 1D residual r k of a point P k relative to the mean point C mean is obtained by the scalar product (also named inner product or dot product)
  • S is the set of points P k such that their distance d k is below the threshold th
  • is the number of points belonging to this set.
  • the normal vector of this triangle can be obtained by the cross product of the edges and
  • the normal vector of triangle V 1 V 2 C mean can be obtained by cross product of and which can be described by equation
  • the normal vector of leaf node in this example is determined by following two steps:
  • the normal vector of a leaf node used for trisoup coding is determined by using a sum result of normal vector of each triangle constructed by vertices and C mean . Then it finds the residual ⁇ along the line constructed from the mean point C mean and the normal vector the aim is to let the position of centroid point C be closer to original points of the point cloud in the leaf node, wherein
  • the weight of each normal vector is not considered, this may cause the obtained normal vector of each leaf node (used for trisoup coding) not optimal, because in many cases the number of points in a leaf node is not the same for each triangle of the leaf node, for example in a case that the surface of a point cloud has many concave areas and convex areas.
  • each normal vector of each triangle may not be the optimal result for normal vector in leaf node, and that will cause non-optimal position of centroid point C. Futher, the positions of reconstructed points may have larger errors relative to original points in a point cloud, which will cause a lower compression efficiency.
  • the normal vector for a leaf node based on a weighted average of normal vector of each triangle in a leaf node to achieve a better compression efficiency.
  • the weight might be determined based on the area of each triangle constructed by vertices and C mean .
  • the normal vector of triangle that has larger area will have bigger impact on the normal vector determination. Therefore, the obtained normal vector will tend to be closer to than original normal vector since it is more likely that larger area of triangle may contain more points in original point cloud.
  • the obtained normal vector ca n be refined to be pointing to a direction that more original points may exist in point cloud by making use of the area information of each triangle. As a result, a more optimal position of centroid point C is obtained and that will cause less reconstructed errors for reconstructed point positions, so the compression efficiency can be further improved.
  • the normal vector of a leaf node is obtained by using weighted average based on the area of each triangle, and the normal vector of leaf node might be determined by following two steps:
  • a i represents the area of the i-th triangle in a leaf node
  • a Sum is defined as the sum of area of each triangle
  • M is number of triangles constructed in a leaf node for trisoup coding.
  • the weight used in the normal vector determination is based on the square root of area of each triangle in a leaf node, then calculating the weighted average of normal vector of each triangle might be obtained by
  • the weight used in the normal vector determination is based on the square of area of each triangle in a leaf node, then calculating the weighted average of normal vector of each triangle can be obtained by
  • a Sum A 1 2 +A 2 2 +...+A M 2 .
  • the weight used in the normal vector determination is based on exponent of area of each triangle in a leaf node, then calculating the weighted average of normal vector of each triangle can be obtained by
  • a Sum A 1 p +A 2 p +...+A M p
  • the parameter p controls the impact degree of triangle area in normal vector determination, which can be set dependent on different kinds of point cloud data to achieve a best compression efficiency.
  • p can be set to be larger than 1 for point cloud data that has many concave areas and convex areas.
  • the area of each triangle (V 1 V 2 C mean , V 2 V 3 C mean , ..., V M - 1 V M C mean , V M V 1 C mean ) can be obtained by cross product (also named vector (cross) product) between two edge vectors
  • cross product also named vector (cross) product
  • the area A 1 of the 1 st triangle can be obtained by
  • the area A i of the i-th triangle can be obtained by
  • the area of each triangle can be obtained by
  • the proposed determination method of normal vector of a leaf node can be used both in encoding process and decoding process of trisoup coding.
  • determines the vertices information of each leaf node, which includes the vertex presence flag information of each edge of the leaf node and vertex position of an edge (if the vertex presence flag of the edge is true) , and then it encodes the trisoup information into bitstream by entropy coding.
  • centroid residual value ⁇ then it obtains centroid residual value ⁇ and encodes it into bitstream by entropy coding.
  • the trisoup decoding process it follows the steps:
  • decodes the vertices information of each leaf node from bitstream, which includes the vertex presence flag information of each edge of the leaf node and vertex position of an edge (if the vertex presence flag of the edge is true) .
  • centroid residual value ⁇ from bitstream. And then the position of the centroid point C of the leaf node can be reconstructed by

Abstract

Method for decoding, from a bitstream, the geometry of a 3D point cloud, preferably implemented in a decoder, including: receiving and decoding a bitstream, wherein the bitstream contains octree information including information about octree structure of the volume of the point cloud and vertex information including information about vertex presence and position of a vertex on edges of cuboids of leaf nodes of the octree structure; determining a virtual position by averaging the positions of vertices of one cuboid relating to a leaf node of the octree structure; constructing triangles by connecting clockwise two consecutive vertices of one cuboid and the virtual position; determining a normal vector of the cuboid based on the constructed triangle; determining a centroid position based on the normal vector; reconstructing triangles by connecting clockwise two consecutive vertices of the cuboid and the centroid position; voxelization of the reconstructed triangles to determine points of the point cloud, wherein the normal vector of the cuboid is determined based on areas of the determined triangles.

Description

Method for encoding and decoding a 3D point cloud, encoder, decoder Technical Field
The present invention relates to a method for decoding a 3D point cloud from a bitstream. Additionally, it is an object of the present invention to provide a method for encoding a 3D point cloud into a bitstream. Further, it is an object of the present invention to provide an encoder and decoder, a bitstream encoded according to the present invention and a software. In particular, it is an object of the present invention to provide a method with increased accuracy of the decoding or reconstruction process of a 3D point cloud.
Background
As a format for the representation of 3D data, point clouds have recently gained traction as they are versatile in their capability in representing all types of 3D objects or scenes. Therefore, many use cases can be addressed by point clouds, among which are
· movie post-production,
· real-time 3D immersive telepresence or VR/AR applications,
· free viewpoint video (for instance for sports viewing) ,
· Geographical Information Systems (aka cartography) ,
· culture heritage (storage of scans of rare objects into a digital form) ,
· Autonomous driving, including 3D mapping of the environment and real-time Lidar data acquisition
A point cloud is a set of points located in a 3D space, optionally with additional values attached to each of the points. These additional values are usually called point attributes. Consequently, a point cloud is a combination of a geometry (the 3D position of each point) and attributes.
Attributes may be, for example, three-component colours, material properties like reflectance and/or two-component normal vectors to a surface associated with the point.
Point clouds may be captured by various types of devices like an array of cameras, depth sensors, Lidars, scanners, or may be computer-generated (in movie post-production for example) . Depending on the use cases, points clouds may have from thousands to up to billions of points for cartography applications.
Raw representations of point clouds require a very high number of bits per point, with at least a dozen of bits per spatial component X, Y or Z, and optionally more bits for the attribute (s) , for instance three times 10 bits for the colours. Practical deployment of point-cloud-based applications requires compression technologies that enable the storage and distribution of point clouds with reasonable storage and transmission infrastructures.
Compression may be lossy (like in video compression) for the distribution to and visualization by an end-user, for example on AR/VR glasses or any other 3D-capable device. Other use cases do require lossless compression, like medical applications or autonomous driving, to avoid altering the results of a decision obtained from the analysis of the compressed and transmitted point cloud.
Until recently, point cloud compression (aka PCC) was not addressed by the mass market and no standardized point cloud codec was available. In 2017, the standardization working group ISO/JCT1/SC29/WG11, also known as Moving Picture Experts Group or MPEG, has initiated work items on point cloud compression. This has led to two standards, namely
· MPEG-I part 5 (ISO/IEC 23090-5) or Video-based Point Cloud Compression (V-PCC)
· MPEG-I part 9 (ISO/IEC 23090-9) or Geometry-based Point Cloud Compression (G-PCC)
Both V-PCC and G-PCC standards have finalized their first version in late 2020.
The V-PCC coding method compresses a point cloud by performing multiple projections of a 3D object to obtain 2D patches that are packed into an image (or a video when dealing with moving point clouds) . Obtained images or videos are then compressed using already existing image/video codecs, allowing for the leverage of already deployed image and video solutions. By its very nature, V-PCC is efficient only on dense and continuous point clouds because image/video codecs are unable to compress non-smooth patches as would be obtained from the projection of, for example, Lidar-acquired sparse geometry data.
The G-PCC coding method has two schemes for the compression of the geometry. The first scheme is based on an occupancy tree (octree/quadtree/binary tree) representation of the point cloud geometry. Occupied nodes are split down until a certain size is reached, and occupied leaf nodes provide the location of points, typically at the centre of these nodes. By using neighbour-based prediction techniques, high level of compression can be obtained for dense point clouds. Sparse point clouds are also addressed by directly coding the position of point within a node with non-minimal size, by stopping the tree construction when only isolated points are present in a node; this technique is known as Direct Coding Mode (DCM) .
The second scheme is based on a predictive tree, each node representing the 3D location of one point and the relation between nodes is spatial prediction from parent to children. This method can only address sparse point clouds and offers the advantage of lower latency and simpler decoding than the occupancy tree. However, compression performance is only marginally better, and the encoding is complex, relatively to the first occupancy-based method, intensively looking for the best predictor (among a long list of potential predictors) when constructing the predictive tree.
In both schemes, attribute (de) coding is performed after complete geometry (de) coding, leading to a two-pass coding. Thus, low latency is obtained by using slices that decompose the 3D space into sub-volumes that are coded independently, without prediction between the sub-volumes. This may heavily impact the compression performance when many slices are used.
An important use case is the transmission of dynamic AR/VR point clouds. Dynamic means that the point cloud evolves with respect to time. Also, AR/VR point clouds are typically locally 2D as they most of time represent the surface of an object. As such, AR/VR point clouds are highly connected (or said to be dense) in the sense that a point is rarely isolated and, instead, has many neighbours.
Dense (or solid) point clouds represent continuous surfaces with a resolution such that volumes (small cubes called voxels) associated with points touch each other without exhibiting any visual hole in the surface.
Such point clouds are typically used in AR/VR environments and are viewed by the end user through a device like a TV, a smartphone or a headset. They are transmitted to the device or stored locally. Many AR/VR applications use moving point clouds, as opposed to static point clouds, that vary with time. Therefore, the volume of data is huge and must be compressed. Nowadays, lossless compression based on an octree representation of the geometry of the point cloud can achieve down to slightly less than a bit per point (1 bpp) . This may not be sufficient for real time transmission that may involve several millions of points per frame with a frame rate as high as 50 frames per second (fps) , thus leading to hundreds of megabits of data per second.
Consequently, lossy compression may be used with the usual requirement of maintaining an acceptable visual quality while compressing sufficiently to fit within the bandwidth provided by the transmission channel while maintaining real time transmission of the frames. In many applications, bitrates as low as 0.1 bpp (10x more compressed than lossless coding) would already make possible real time transmission.
The codec VPCC based on MPEG-I part 5 (ISO/IEC 23090-5) or Video-based Point Cloud Compression (V-PCC) can achieve such low bitrates by using lossy compression of video codecs that compress 2D frames obtained from the projection of the point cloud on a plane. The geometry is represented by a series of projection patches assembled into a frame, each patch being a small local depth map. However, VPCC is not versatile and is limited to a narrow type of point clouds that do not exhibit locally complex geometry (like trees, hair) because the obtained projected depth map would not be smooth enough to be efficiently compressed by a video codec.
Purely 3D compression techniques can handle any type of point clouds. It is still an open question whether 3D compression techniques can compete with VPCC (or any projection + image coding scheme) on dense point clouds. Standardization is still under its way toward offering an extension (an amendment) of GPCC that would provide competitive lossy compression that would compress dense point clouds as good as VPCC intra while maintaining the versatility of GPCC that can handle any type of point clouds (dense, Lidar, 3D maps) . This extension is likely to use the so-called TriSoup coding scheme that works over to an octree, as detailed in next sections. TriSoup is under exploration in the standardization working group JTC1/SC29/WG7 of ISO/IEC.
However, as for all lossy compression schemes, quality of reconstruction the points of the point cloud is essential.
SUMMARY
Thus, it is an object of the present invention to provide a method for decoding geometry of a 3D point cloud from a bitstream as well as encoding of a 3D point cloud into a bitstream with increased efficiency.
The problem is solved by a method for decoding according to claim 1, a method for encoding according to claim 2, an encoder according to claim 6, a decoder according to claim 7, a bitstream according to claim 8 and a software according to claim 9.
In a first aspect a method for decoding geometry of a 3D point cloud from a bitstream is provided preferably implemented in a decoder. The method includes:
receiving and decoding a bitstream, wherein the bitstream contains octree information including information about octree structure of the volume of the point cloud and vertex information including information about vertex presence and position of a vertex on edges of cuboids of leaf nodes of the octree structure;
determining a virtual position by averaging the positions of vertices of one cuboid relating to a leaf node of the octree structure;
constructing triangles by connecting clockwise two consecutive vertices of one cuboid and the virtual position;
determining a normal vector of the cuboid based on the constructed triangle;
determining a centroid position based on the virtual position and the normal vector;
reconstructing triangles by connecting clockwise two consecutive vertices of the cuboid and the centroid position;
voxelization of the reconstructed triangles to determine points of the point cloud,
wherein the normal vector of the cuboid is determined based on areas of the determined triangles.
Thus, in a first step a bitstream is received and the bitstream contains information regarding the octree structure of the volume of the point cloud which are decoded. Preferably, the geometry of the Point cloud is GPCC-encoded. Thus, by decoding from the bitstream the octree information about the volume of the point could is provided. Further, the bitstream also includes vertex information including information about vertex presence and position of a vertex on edges of the cuboids relating to leaf nodes in the octree structure. Thus, the vertex information is provided by decoding from the bitstream. Therein the bitstream is preferably encoded by a TriSoup encoding scheme at the encoder.
After decoding the octree information and vertex information from the bitstream which is described in previous one step, in a further step, for reconstructing the point cloud geometry, triangles are determined for each cuboid. In particular, a virtual position is first determined by averaging the positions of vertices of one cuboid. Then, a first set of triangles are constructed by connecting clockwise two consecutive vertices on the edges of a cuboid and the determined virtual position. Thus, the surfaces of the first set triangles are determined by the position of the vertices  included in the bitstream and the determined virtual position (i.e., mean position of the vertices) .
Subsequently, a normal vector of the cuboid is determined based on the constructed triangle. Therein, a normal vector is a vector that is perpendicular to a given object. Thus, the normal vector of each of the first set of triangles can be determined respectively. Then, the normal vector of the cuboid may be determined based on the sum of the determined normal vectors of the first set of the triangles. According to the present invention, the normal vectors of the first set of the triangles are weighted. Preferably, the weight is determined based on the areas of the first set of triangles. Triangles with larger area are more likely to contain more points of the point cloud and are thus are given a larger weight. Then a centroid position is determined based on the virtual position and the weighted normal vector of the cuboid. With such adjustment, the determined centroid position might be closer to the original positions of the point cloud in the leaf node. The triangles are constructed according to the vertices and the centroid in a similar manner as previously described.
In order to reconstruct the points of the point cloud from the reconstructed triangles, voxelization is performed by a ray-tracing process wherein in the ray-tracing process rays are launched along the three directions parallel to any of the three axes. Their origin is a point of integer coordinates corresponding to the sampling precision wanted for the rendering. The intersection point (if any) of the ray with one of the reconstructed triangles is then determined and added to the list of rendered points, i.e., added to the points of the point cloud. The surface of the reconstructed triangles is sampled by the rays during voxelization in order to determine the points of point cloud.
Preferably, the bitstream further contains an additional value α and the centroid position is further determined based on the additional value α. Therein, the additional value is determined by the encoder and might be used to calculate the centroid position.
Preferably, the additional value α is determined based on the position of all the points of the point cloud belonging to the current leaf node. Thus, the additional value might be determined by the encoding by considering all the points of the point could belonging to the current leaf node. Preferably, only points that located within a pre- determined distance from the line (virtual point, the normal vector) is considered. With the additional value, the accuracy of the centroid point might be further improved.
In another aspect of the present invention a method for encoding a 3D point cloud into a bitstream is provided preferably implemented in an encoder. The method for encoding the 3D point cloud includes:
obtaining octree information including an octree structure of a volume including a plurality of cuboids;
obtaining vertex information from surfaces of the point cloud for each cuboid relating to a leaf node, wherein the vertex information includes information about vertex presence and position of a vertex on edges of the cuboid;
encoding the octree information and the vertex information into a bit-stream;
reconstructing the point cloud geometry data by using octree information and vertex information obtained in the preceding encoding process, wherein reconstructing the point cloud data includes:
determining a virtual position by averaging the positions of vertices of one cuboid relating to a leaf node of the octree structure;
constructing triangles by connecting clockwise two consecutive vertices of one cuboid and the virtual position;
determining a normal vector of the cuboid based on the constructed triangle;
determining a centroid position based on the normal vector;
reconstructing triangles by connecting clockwise two consecutive vertices of the cuboid and the centroid position;
voxelization of the reconstructed triangles to determine points of the point cloud,
wherein the normal vector of the cuboid is determined based on areas of the determined triangles.
Thus, by the method for encoding, the octree information as well as the vertex information are generated. This information is encoded into the bitstream. Subsequently at the encoder side, a reconstruction step is performed. In this reconstruction step the point cloud geometry information is reconstructed, wherein the steps of reconstructing are the same as that in the method for decoding as described above. The reconstructed geometry of point cloud at the encoder side is then used to encode attributes (color, reflectance…) of the points of the point cloud for example by RAHT (Region-Adaptive Hierarchical Transform) , predicting transform or lifting transform being used in order to encode the attributes of the points of the point cloud.
Preferably, geometry of the point cloud is encoded into the bitstream by Geometry-based Point Cloud Compression (G-PCC) .
Preferably, the bitstream is an MPEG G-PCC compliant bitstream.
Preferably, the method for encoding is further built according to the features described before in connection with the method for decoding.
In another aspect of the present invention an encoder is provided for encoding a 3D point cloud into a bitstream. The encoder comprises a memory and a processor, wherein instructions are stored in the memory, which when executed by the processor perform the steps of the method for encoding described before.
In another aspect of the present invention a decoder is provided for decoding a 3D point cloud from a bitstream. The decoder comprises a memory and a processor, wherein instructions are stored in the memory, which when executed by the processor perform the steps of the method for decoding de-scribed before.
In another aspect of the present invention a bitstream is provided, wherein the bitstream is encoded by the steps of the method for encoding described before.
In another aspect of the present invention a computer-readable storage medium is provided comprising instructions to perform the steps of the method for encoding a 3D point cloud into a bitstream as described above.
In another aspect of the present invention a computer-readable storage medium is provided comprising instructions to perform the steps of the method for decoding a 3D point cloud from a bitstream as described above.
FIGURES
In the following the present invention is described in more detail with reference to the accompanying figures.
The figures show:
Figure 1 a flow diagram of the method for decoding a 3D point cloud geometry according to the present invention,
Figure 2 an example for generation of the octree structure,
Figure 3 an octree according to Figure 2,
Figure 4 an example for determining vertices on edges of a cuboid,
Figure 5 an example for generating triangles,
Figure 6 an example of vertices on the edges of a cuboid,
Figure 7 a generation of the triangle by vertices,
Figure 8 an example of determining the order of the triangles according to Figure 7,
Figure 9 a schematic drawing for the step of voxelization,
Figure 10 an example of reconstruction of triangles using the centroid point C as a pivoting point,
Figure 11 an example of normal vector,
Figure 12 an example of a 1D residual along the normal vector,
Figure 13 an example of normal vector determination process,
Figure 14 an example of calculating the area of a parallelogram.
DETAILED DESCRIPTION
Referring to Figure 1 showing a schematic diagram for the method of decoding geometry information of a 3D point cloud from a bitstream.
The method for decoding geometry of a 3D point cloud from a bitstream, preferably implemented in a decoder, includes the steps:
receiving and decoding a bitstream, wherein the bitstream contains octree information including information about octree structure of the volume of the point cloud and vertex information including information about vertex presence and position of a vertex on edges of cuboids of leaf nodes of the octree structure;
determining a virtual position by averaging the positions of vertices of one cuboid relating to a leaf node of the octree structure;
constructing triangles by connecting clockwise two consecutive vertices of one cuboid and the virtual position;
determining a normal vector of the cuboid based on the constructed triangle;
determining a centroid position based on the normal vector;
reconstructing triangles by connecting clockwise two consecutive vertices of the cuboid and the centroid position;
voxelization of the reconstructed triangles to determine points of the point cloud,
wherein the normal vector of the cuboid is determined based on areas of the determined triangles.
The first step of the geometry encoding process in order to determine the octree information is to build and encode an octree, as illustrated in Figures 2 and 3. The bounding box is the main volume 100 that contains all the points, and is associated to the root node 112 (i.e. single node at the top of the tree 110) . This volume 100 is first divided into 8 sub-volumes 102 called octants, each is represented by a node 114 in the tree 110. The octants 106 that are occupied by at least one point, which are shaded in Figures 2 and 3, are then recursively split in sub-volumes 104 until a target level is reached.
Each octant (or node) is represented by an occupancy byte that contains one bit per child octant, set to one if it is occupied by at least one point, or to zero otherwise. The occupancy bytes 118 of all the octants are serialized (in breadth-first order) and entropy coded with a binary arithmetic encoder.
Figure 4 illustrates a blocking representation of a 3D surface 210, as well as an example of a block 220 in a TriSoup. The surface 210 intersects the block 220, which is therefore an occupied block, and the block 220 exists among multiple blocks 200 in 3D space. Within the block 220, the enclosed portion of the surface 210 intersects the edges of the block at six illustrated vertices of a polygon 230. An edge of the block 220 is said to be selected if it contains a vertex.
Figure 5 illustrates the block 220 in the TriSoup, omitting the surface 210 for clarity, and showing a non-selected edge 270, a selected edge 260, and the ith edge 250. Suppose the ith edge 250 is selected. To specify a vertex vi on edge i, one specifies a scalar value to indicate a corresponding fraction of the length of the edge 250.
As illustrated in Figures 4 and 5, within each octant 220 in the target level of the octree, the trisoup represents the original surface 210 as a set of triangles 245. This surface is encoded and used to obtain the positions of the recon-structed (or decoded) points. First, the intersections of the surface represented by the original points with the edges of the octants are estimated by averaging the positions of the points that are the closest to those edges within the octant. Secondly, the twelve edges of all the octants and their associated intersections (if any) are stored as segments and vertices respectively. Each (unique) segment is then encoded as follows. A first single bit is arithmetically coded, set to one if the segment is occupied by a vertex and zero otherwise. If it is occupied, the relative position of the vertex on the segment is also arithmetically coded.
Vertices 310 of triangles are coded along the edges 320 of volumes associated with leaf nodes 300 of the tree, as depicted on Figure 6. These vertices 310 on edge 320 are shared among leaf nodes 300 having a common edge 320. This means that at most one vertex is coded per edge belonging to at least one leaf node. By doing so, continuity of the model is ensured through leaf nodes.
As mentioned above, the encoding of the TriSoup vertices requires two in-formation per edge:
· a vertex flag indicating if a TriSoup vertex is present on the edge, and
· when present, the vertex position along the edge.
Consequently, the coded data consists in the octree data plus the TriSoup data.
The vertex flag is coded by an adaptive binary arithmetic coder that uses one specific context for coding vertex flags. The position of a vertex on an edge of length N=2^s might be coded with unitary precision by pushing (bypassing/not entropy coding) s bits into the bitstream.
Inside a leaf node, triangles are constructed from the TriSoup vertices if at least three vertices 310 are present on the edges 320 of the leaf node 300.  Reconstructed triangles  330, 340 are depicted in Figure 7.
Obviously, other combinations of  triangles  330, 340 are possible. The choice of triangles comes from a three-step process
1. determining a dominant direction along one of the three axes
2. ordering TriSoup vertices depending on the dominant direction
3. constructing triangle based on the ordered list of vertices
Knowledge about the exact position of the triangles within the current leaf is not necessary and can be deduced from the vertices.
Figure 8 will be used to explain this process. Each of the three axis is tested and the one maximizing the total surfaces of triangle is kept as dominant axis. For simplicity of the figure, only the test over two axis is depicted on Figure 8.
A first test (top) along the vertical axis is performed by projecting the cube and the Trisoup vertices 310 vertically on a 2D plane. The vertices 310 are then ordered following a clockwise order relative to the center of the projected node (asquare) . Then,  triangles  330, 340 are constructed following a fixed rule based on the ordered vertices. Here, triangles 123 and 134 are constructed systematically when 4 vertices are involved. When 3 vertices are present, the only possible triangle is 123. When 5 vertices are present, a fixed rule may be to construct triangles 123, 134 and 451. And so on, up to 12 vertices.
A second test (left) along a horizontal vertical axis is performed by projecting the cube and the Trisoup vertices horizontally on a 2D plane.
The vertical projection exhibits the 2D total surface of triangles that is maximum, thus the dominant axis is selected as vertical, and the constructed TriSoup triangles are obtained from the order of the vertical projection, as in Figure 8 inside the node. It is to be noted that taking the horizontal axis as dominant would have led to another construction of triangles.
The adequate selection of the dominant axis by maximizing the projected surface leads to a continuous reconstruction of the point cloud without holes.
The rendering of TriSoup triangles into points is performed by ray tracing. The set of all rendered points by ray tracing will make the decoded point cloud.
For ray tracing as shown in Figure 9, rays are launched along the three directions parallel to an axis. Their origin is a point of integer (voxelized) coordinates of precision corresponding to the sampling precision wanted for the rendering. The intersection (if any, dashed point) with one of the Trisoup triangles is then voxelized (= rounded to the closest point at the wanted sampling precision) and added to the list of rendered points.
After applying Trisoup to all leaf nodes, i.e., constructing triangles and obtaining points by ray tracing, copies of same points in the list of all rendered points are discarded (i.e. only one voxel is kept among all voxels sharing the same position and volume) to obtain a set of decoded (unique) points.
Based on the above deprescribed basic concepts, improvements might be applied to the Trisoup coding. For example, by computing a centroid point whose coordinates are the mean coordinates of all (ordered) vertices Vi, see Figure 10 where the centroid point C is depicted using checkboard filling. The centroid point is used as a pivoting point. The triangle are constructed from the ordered vertices (V 1, V 2, …, V M) by pivoting around the centroid point C. The following M triangles are constructed
· V 1V 2C
· V 2V 3C
· …
· V M- 1V MC
· V MV 1C
Therein, the vertices are ordered clockwise and is not important which of the vertices is chosen as V 1.
This construction preserves natural symmetries of the model without privileging arbitrary some triangles. Moreover, it provides an additional degree of freedom to improve the accuracy of the model, namely the position of the centroid point C.
Consequently, the position of the centroid point C might be further improved by coding a residual position in the bitstream such that the position centroid point C is closer from original points of the point cloud.
For example, C = C mean + C res
where C mean is the mean position obtained by averaging coordinates of all (ordered) vertices, and C res is a coded residual.
The coded residual may be a 3D residual. However, it is observed that a 3D residual is rarely advantageous because it requires many bits to be coded and this many bits are not fully compensated by the better accuracy of the model. Therefore, it is preferred to code a 1D residual C res.
For example, a normal vector 
Figure PCTCN2022125764-appb-000001
 may be constructed, as shown in Figure 11, and the residual may be determined by the following equation:
Figure PCTCN2022125764-appb-000002
where α is a 1D signed scalar value coded in the bitstream, see Figure 12. The normal vector 
Figure PCTCN2022125764-appb-000003
 may be derived by the following two steps
1. 
Figure PCTCN2022125764-appb-000004
2. and then normalization
Figure PCTCN2022125764-appb-000005
where × is the cross product (also named vector (cross) product) between two vectors, and the edges
Figure PCTCN2022125764-appb-000006
are:
Figure PCTCN2022125764-appb-000007
It will be understood that the vector
Figure PCTCN2022125764-appb-000008
may be taken parallel to an axis in order to simplify computation of its determination and the value α. The vector
Figure PCTCN2022125764-appb-000009
may be taken parallel to the dominant axis as a good approximation of the vector computed above.
The value α is determined by the encoder, encoded into the bitstream and obtained by the decoder by decoding the bitstream. The value α may be binarized and each bit may be encoded by using a binary entropy coder such as an arithmetic coder or a context adaptive binary coder like CABAC.
The value α may be binarized into
· a flag f 0 indicating if α is equal to 0,
· a sign that indicates if α>0 or α<0,
· a flag f 1 indicating if |α| is equal to 1,
· a remainder |α|-2 coded by an expGolomb coder.
The value α may be determined by the encoder by considering all points P k of the point cloud belonging to a current leaf node. For each point P k, its distance d k from the line
Figure PCTCN2022125764-appb-000010
is found by
Figure PCTCN2022125764-appb-000011
and in case this distance d k is below a pre-defined threshold th (e.g., th =2) , the point P k is used to compute the value α. The 1D residual r k of a point P k relative to the mean point C mean is obtained by the scalar product (also named inner product or dot product)
Figure PCTCN2022125764-appb-000012
The value α is thus obtained by
Figure PCTCN2022125764-appb-000013
where S is the set of points P k such that their distance d k is below the threshold th, and |S| is the number of points belonging to this set.
In a triangle (assume the index of this triangle is i) constructed by vertices and C mean, the normal vector
Figure PCTCN2022125764-appb-000014
of this triangle can be obtained by the cross product of the edges 
Figure PCTCN2022125764-appb-000015
and
Figure PCTCN2022125764-appb-000016
For example, as shown in Figure 13, the normal vector
Figure PCTCN2022125764-appb-000017
of  triangle V 1V 2C mean can be obtained by cross product of
Figure PCTCN2022125764-appb-000018
and
Figure PCTCN2022125764-appb-000019
which can be described by equation
Figure PCTCN2022125764-appb-000020
the normal vector
Figure PCTCN2022125764-appb-000021
of triangle V 2V 3C mean can be obtained by cross product of
Figure PCTCN2022125764-appb-000022
and 
Figure PCTCN2022125764-appb-000023
which can be described by equation
Figure PCTCN2022125764-appb-000024
and similarly, the normal vector
Figure PCTCN2022125764-appb-000025
of triangle V 3V 4C mean and
Figure PCTCN2022125764-appb-000026
of triangle V 4V 1C mean can be obtained by
Figure PCTCN2022125764-appb-000027
Figure PCTCN2022125764-appb-000028
Then, the normal vector
Figure PCTCN2022125764-appb-000029
of leaf node in this example is determined by following two steps:
1. calculate the sum of normal vector
Figure PCTCN2022125764-appb-000030
of each triangle (V 1V 2C mean, V 2V 3 C mean, …, V M- 1V M C mean, V MV 1 C mean) ,
Figure PCTCN2022125764-appb-000031
2. and then calcu late normalization of normal vector
Figure PCTCN2022125764-appb-000032
As described above, the normal vector
Figure PCTCN2022125764-appb-000033
of a leaf node used for trisoup coding is determined by using a sum result of normal vector
Figure PCTCN2022125764-appb-000034
of each triangle constructed by vertices and C mean. Then it finds the residual α along the line constructed from the mean point C mean and the normal vector
Figure PCTCN2022125764-appb-000035
the aim is to let the position of centroid point C be closer to original points of the point cloud in the leaf node, wherein
Figure PCTCN2022125764-appb-000036
Figure PCTCN2022125764-appb-000037
However, in the sum calculation, the weight of each normal vector
Figure PCTCN2022125764-appb-000038
is not considered, this may cause the obtained normal vector
Figure PCTCN2022125764-appb-000039
of each leaf node (used for trisoup coding) not optimal, because in many cases the number of points in a leaf  node is not the same for each triangle of the leaf node, for example in a case that the surface of a point cloud has many concave areas and convex areas.
Thus, to simply sum each normal vector
Figure PCTCN2022125764-appb-000040
of each triangle may not be the optimal result for normal vector
Figure PCTCN2022125764-appb-000041
in leaf node, and that will cause non-optimal position of centroid point C. Futher, the positions of reconstructed points may have larger errors relative to original points in a point cloud, which will cause a lower compression efficiency.
Thus, it is proposed to estimate the normal vector
Figure PCTCN2022125764-appb-000042
for a leaf node based on a weighted average of normal vector
Figure PCTCN2022125764-appb-000043
of each triangle in a leaf node to achieve a better compression efficiency. The weight might be determined based on the area of each triangle constructed by vertices and C mean. By doing so, the normal vector
Figure PCTCN2022125764-appb-000044
of triangle that has larger area will have bigger impact on the normal vector
Figure PCTCN2022125764-appb-000045
determination. Therefore, the obtained normal vector
Figure PCTCN2022125764-appb-000046
will tend to be closer to 
Figure PCTCN2022125764-appb-000047
than original normal vector
Figure PCTCN2022125764-appb-000048
since it is more likely that larger area of triangle may contain more points in original point cloud. Thus, the obtained normal vector
Figure PCTCN2022125764-appb-000049
ca n be refined to be pointing to a direction that more original points may exist in point cloud by making use of the area information of each triangle. As a result, a more optimal position of centroid point C is obtained and that will cause less reconstructed errors for reconstructed point positions, so the compression efficiency can be further improved.
In some embodiments, the normal vector
Figure PCTCN2022125764-appb-000050
of a leaf node is obtained by using weighted average based on the area of each triangle, and the normal vector
Figure PCTCN2022125764-appb-000051
of leaf node might be determined by following two steps:
1. calculate the weighted average of normal vector
Figure PCTCN2022125764-appb-000052
of each triangle by the following equation:
Figure PCTCN2022125764-appb-000053
2. Do normalization of vector
Figure PCTCN2022125764-appb-000054
to get the normal vector
Figure PCTCN2022125764-appb-000055
Figure PCTCN2022125764-appb-000056
wherein A i represents the area of the i-th triangle in a leaf node, and A Sum is defined as the sum of area of each triangle, M is number of triangles constructed in a leaf node for trisoup coding.
In some embodiments, the weight used in the normal vector
Figure PCTCN2022125764-appb-000057
determination is based on the square root of area of each triangle in a leaf node, then calculating the weighted average of normal vector
Figure PCTCN2022125764-appb-000058
of each triangle might be obtained by
Figure PCTCN2022125764-appb-000059
wherein
Figure PCTCN2022125764-appb-000060
In some embodiments, the weight used in the normal vector
Figure PCTCN2022125764-appb-000061
determination is based on the square of area of each triangle in a leaf node, then calculating the weighted average of normal vector
Figure PCTCN2022125764-appb-000062
of each triangle can be obtained by
Figure PCTCN2022125764-appb-000063
wherein A Sum = A 1 2+A 2 2+…+A M 2.
In a more general way, the weight used in the normal vector
Figure PCTCN2022125764-appb-000064
determination is based on exponent of area of each triangle in a leaf node, then calculating the weighted average of normal vector
Figure PCTCN2022125764-appb-000065
of each triangle can be obtained by
Figure PCTCN2022125764-appb-000066
wherein A Sum = A 1 p+A 2 p+…+A M p, and the parameter p controls the impact degree of triangle area in normal vector determination, which can be set dependent on different kinds of point cloud data to achieve a best compression efficiency. For  example, p can be set to be larger than 1 for point cloud data that has many concave areas and convex areas.
In some embodiments, the area of each triangle (V 1V 2C mean, V 2V 3 C mean, …, V M- 1V M C mean, V MV 1 C mean) can be obtained by cross product (also named vector (cross) product) between two edge vectors
Figure PCTCN2022125764-appb-000067
For example, the area A 1 of the 1 st triangle can be obtained by
Figure PCTCN2022125764-appb-000068
since the
Figure PCTCN2022125764-appb-000069
is the area of a parallelogram, which is shown in Figure 14, and the area of triangle V 1V 2C mean is half of the parallelogram.
And in general, the area A i of the i-th triangle can be obtained by
Figure PCTCN2022125764-appb-000070
Wherein i = j, different variables are used here just to distinguish the index of a triangle and the index of vertices.
In some embodiments, the area of each triangle can be obtained by
Figure PCTCN2022125764-appb-000071
since the normal vector
Figure PCTCN2022125764-appb-000072
is an average result of area weight, and the scalar 2 will be omitted in the equation of weighted average of normal vector
Figure PCTCN2022125764-appb-000073
For example,
Figure PCTCN2022125764-appb-000074
will become
Figure PCTCN2022125764-appb-000075
According to the present invention, the proposed determination method of normal vector
Figure PCTCN2022125764-appb-000076
of a leaf node can be used both in encoding process and decoding process of trisoup coding.
According to some embodiments of the trisoup encoding process, it follows the steps:
· Firstly, it determines the vertices information of each leaf node, which includes the vertex presence flag information of each edge of the leaf node and vertex position of an edge (if the vertex presence flag of the edge is true) , and then it encodes the trisoup information into bitstream by entropy coding.
· Then, to encode the information of position of the centroid point C for each leaf node into bitstream, it iterates on each leaf node. And for each leaf node whose vertice number is larger than 2,
· it calculates the C mean by using obtained vertices in the leaf node,
· and then it calcu lates the normal vector
Figure PCTCN2022125764-appb-000077
of the leaf node based on C mean and obtained vertices by using proposed determination method of normal vector
Figure PCTCN2022125764-appb-000078
 (i.e., based on the weighted area of triangles)
· then it obtains centroid residual value α and encodes it into bitstream by entropy coding.
According to some embodiments of the trisoup decoding process, it follows the steps:
· Firstly, it decodes the vertices information of each leaf node from bitstream, which includes the vertex presence flag information of each edge of the leaf node and vertex position of an edge (if the vertex presence flag of the edge is true) .
· Then, to reconstruct the information of position of the centroid point C for each leaf node, it iterates on each leaf node. And for each leaf node whose vertice number is larger than 2,
· it calculates the C mean by using decoded vertices in the leaf node,
· and then it calculates the normal vector
Figure PCTCN2022125764-appb-000079
of the leaf node based on C mean and decoded vertices, by using proposed determination method of normal vector
Figure PCTCN2022125764-appb-000080
 (i.e., based on the weighted area of triangles)
· then it decodes centroid residual value α from bitstream. And then the position of the centroid point C of the leaf node can be reconstructed by 
Figure PCTCN2022125764-appb-000081
· Finally, it constructs triangles (V 1V 2C, V 2V 3 C, …, V M- 1V M C, V MV 1 C) in the leaf node and then reconstruct points in the leaf node by using ray tracing method on the constructed triangles (V 1V 2C, V 2V 3 C, …, V M- 1V M C, V MV 1 C) .

Claims (9)

  1. Method for decoding, from a bitstream, the geometry of a 3D point cloud, preferably implemented in a decoder, including:
    receiving and decoding a bitstream, wherein the bitstream contains octree information including information about octree structure of the volume of the point cloud and vertex information including information about vertex presence and position of a vertex on edges of cuboids of leaf nodes of the octree structure;
    determining a virtual position by averaging the positions of vertices of one cuboid relating to a leaf node of the octree structure;
    constructing triangles by connecting clockwise two consecutive vertices of one cuboid and the virtual position;
    determining a normal vector of the cuboid based on the constructed triangle;
    determining a centroid position based on the normal vector;
    reconstructing triangles by connecting clockwise two consecutive vertices of the cuboid and the centroid position;
    voxelization of the reconstructed triangles to determine points of the point cloud,
    wherein the normal vector of the cuboid is determined based on areas of the determined triangles.
  2. Method for encoding a 3D point cloud into a bitstream, preferably implemented in an encoder, including:
    obtaining octree information including an octree structure of a volume including a plurality of cuboids;
    obtaining vertex information from surfaces of the point cloud for each cuboid relating to a leaf node, wherein the vertex information includes information about vertex presence and position of a vertex on edges of the cuboid;
    encoding the octree information and the vertex information into a bitstream;
    reconstructing the point cloud geometry data by using octree information and vertex information obtained in the preceding encoding process, wherein reconstructing the point cloud data includes:
    determining a virtual position by averaging the positions of vertices of one cuboid relating to a leaf node of the octree structure;
    constructing triangles by connecting clockwise two consecutive vertices of one cuboid and the virtual position;
    determining a normal vector of the cuboid based on the constructed triangle;
    determining a centroid position based on the virtual position and the normal vector;
    reconstructing triangles by connecting clockwise two consecutive vertices of the cuboid and the centroid position;
    voxelization of the reconstructed triangles to determine points of the point cloud,
    wherein the normal vector of the cuboid is determined based on areas of the determined triangles.
  3. The method according to claim 1, wherein the bitstream further contains an additional value α and the centroid position is further determined based on the additional value α.
  4. The method according to claim 2, further comprising: determining an additional value α; encoding the additional value into the bitstream; wherein the centroid position is further determined based on the additional value α.
  5. The method according to any of claims 3 or 4, the additional value α is determined based on the position of all the points of the point cloud.
  6. Encoder to encode a 3D point cloud into a bitstream comprising at least one processor and a memory, wherein the memory stores instructions when executed by the processor perform the steps of the method according to any of claims 2, 4 and 5 when dependent on claim 2.
  7. Decoder to decode a 3D point cloud from a bitstream comprising at least one processor and a memory, wherein the memory stores instructions when executed by the processor perform the steps of the method according to any of claims 1, 3 and 5 when dependent on claim 1.
  8. Bitstream encoded by the method according to any of claims 2, 4 and 5 when dependent on claim 2.
  9. Computer-readable storage medium comprising instructions when executed by a processor to perform the steps of the method according to any of claims 1 to 5.
PCT/CN2022/125764 2022-10-17 2022-10-17 Method for encoding and decoding a 3d point cloud, encoder, decoder WO2024082108A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/125764 WO2024082108A1 (en) 2022-10-17 2022-10-17 Method for encoding and decoding a 3d point cloud, encoder, decoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/125764 WO2024082108A1 (en) 2022-10-17 2022-10-17 Method for encoding and decoding a 3d point cloud, encoder, decoder

Publications (1)

Publication Number Publication Date
WO2024082108A1 true WO2024082108A1 (en) 2024-04-25

Family

ID=84360788

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/125764 WO2024082108A1 (en) 2022-10-17 2022-10-17 Method for encoding and decoding a 3d point cloud, encoder, decoder

Country Status (1)

Country Link
WO (1) WO2024082108A1 (en)

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
"EE 13.50 on triangle soup", no. n21576, 26 May 2022 (2022-05-26), XP030302492, Retrieved from the Internet <URL:https://dms.mpeg.expert/doc_end_user/documents/138_OnLine/wg11/MDS21576_WG07_N00331.zip MDS21576_WG07_N0331.docx> [retrieved on 20220526] *
"Liaison Statement to WG 1 on JPSearch", no. n6766, 29 October 2004 (2004-10-29), XP030285105, Retrieved from the Internet <URL:http://phenix.int-evry.fr/mpeg/doc_end_user/documents/70_Palma%20de%20Mallorca/wg11/w6766.zip 29n4990t.pdf> [retrieved on 20100827] *
JERRY O TALTON: "A Short Survey of Mesh Simplification Algorithms", 1 October 2014 (2014-10-01), XP055430290, Retrieved from the Internet <URL:https://cg.informatik.uni-freiburg.de/intern/seminar/meshSimplification_2004_Talton.pdf> [retrieved on 20171129] *
KHALED MAMMON ET AL: "G-PCC codec description", no. N18189, 22 February 2019 (2019-02-22), pages 1 - 39, XP030212734, Retrieved from the Internet <URL:http://phenix.int-evry.fr/mpeg/doc_end_user/documents/125_Marrakech/wg11/w18189.zip w18189.docx> [retrieved on 20190222] *
LASSERRE (XIAOMI) S: "[GPCC][TriSoup] Part 3 Adding a residual for the centroid vertex", no. m59290, 12 April 2022 (2022-04-12), XP030301446, Retrieved from the Internet <URL:https://dms.mpeg.expert/doc_end_user/documents/138_OnLine/wg11/m59290-v1-m59290%5BGPCC%5D%5BTriSoup%5DPart3Addingaresidualforthecentroidvertex.zip m59290 [GPCC][TriSoup] Part 3 Adding a residual for the centroid vertex.pptx> [retrieved on 20220412] *
SHUO GAO (XIAOMI) ET AL: "[G-PCC][EE13.50 related] Improvement of normal vector determination of trisoup coding", no. m61016, 26 October 2022 (2022-10-26), XP030305445, Retrieved from the Internet <URL:https://dms.mpeg.expert/doc_end_user/documents/140_Mainz/wg11/m61016-v2-m61016.zip m61016 - [G-PCC][EE13.50 related] Improved normal vector determination of trisoup coding.docx> [retrieved on 20221026] *

Similar Documents

Publication Publication Date Title
KR102609776B1 (en) Point cloud data processing method and device
KR102406845B1 (en) Point cloud data transmission apparatus, point cloud data transmission method, point cloud data reception apparatus and point cloud data reception method
KR102634079B1 (en) Point cloud data processing device and method
WO2021062528A1 (en) Angular mode for tree-based point cloud coding
CN112385236A (en) Method for encoding and decoding point cloud
WO2021062530A1 (en) Angular mode syntax for tree-based point cloud coding
Gonçalves et al. Encoding efficiency and computational cost assessment of state-of-the-art point cloud codecs
WO2024082108A1 (en) Method for encoding and decoding a 3d point cloud, encoder, decoder
WO2023272730A1 (en) Method for encoding and decoding a point cloud
KR20240032912A (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device and point cloud data reception method
KR102294613B1 (en) An apparatus for transmitting point cloud data, a method for transmitting point cloud data, an apparatus for receiving point cloud data, and a method for receiving point cloud data
WO2023184393A1 (en) Method for encoding and decoding a 3d point cloud, encoder, decoder
WO2023240471A1 (en) Method for encoding and decoding a 3d point cloud, encoder, decoder
KR20230173094A (en) Point cloud data transmission method, point cloud data transmission device, point cloud data reception method, and point cloud data reception device
WO2024082109A1 (en) Method for encoding and decoding a 3d point cloud, encoder, decoder
JP2023544569A (en) GPCC plane mode and buffer simplification
WO2023197122A1 (en) Method for encoding and decoding for trisoup vertex positions
EP4258213A1 (en) Methods and apparatus for entropy coding a presence flag for a point cloud and data stream including the presence flag
EP4258214A1 (en) Methods and apparatus for coding a vertex position for a point cloud, and data stream including the vertex position
WO2024031584A1 (en) Method for encoding and decoding a 3d point cloud, encoder, decoder
CN116868572A (en) Methods, encoder and decoder for encoding and decoding 3D point clouds
CN117795554A (en) Method for decoding and encoding 3D point cloud
WO2023179277A1 (en) Encoding/decoding positions of points of a point cloud encompassed in a cuboid volume
US20230342987A1 (en) Occupancy coding using inter prediction with octree occupancy coding based on dynamic optimal binary coder with update on the fly (obuf) in geometry-based point cloud compression
EP4365843A1 (en) Encoding/decoding point cloud geometry data