WO2024082105A1

WO2024082105A1 - Encoding method, decoding method, decoder, encoder and computer-readable storage medium

Info

Publication number: WO2024082105A1
Application number: PCT/CN2022/125742
Authority: WO
Inventors: 马展; 薛瑞翔; 魏红莲
Original assignee: Oppo广东移动通信有限公司
Priority date: 2022-10-17
Filing date: 2022-10-17
Publication date: 2024-04-25

Abstract

Provided in the embodiments of the present application are an encoding method, a decoding method, a decoder, an encoder and a computer-readable storage medium, which can improve the encoding and decoding performance. The decoding method comprises: parsing a code stream to determine coded information corresponding to a second-scale point cloud, and determining a first-scale point cloud; performing local density prediction on the basis of the first-scale point cloud to determine a local density corresponding to first-scale voxels in the first-scale point cloud, and performing occupancy probability prediction on second-scale voxels to determine an occupancy probability corresponding to the second-scale voxels, the second-scale voxels being up-sampling voxels corresponding to the first-scale voxels, and the local density representing the number of occupied second-scale voxels amongst the second-scale voxels corresponding to the first-scale voxels; and, on the basis of the occupancy probability corresponding to the second-scale voxels and the local density corresponding to the first-scale voxels, decoding and reconstructing the coded information corresponding to the second-scale point cloud, so as to determine reconstructed geometric information corresponding to the second-scale point cloud.

Description

Coding and decoding method, decoder, encoder and computer readable storage medium

Technical Field

The present application relates to point cloud compression coding and decoding technology, and in particular to a coding and decoding method, a decoder, an encoder and a computer-readable storage medium.

Background technique

Point cloud is a collection of points that can store the geometric position and related attribute information of each point, so as to accurately describe objects in space. Point cloud data is huge, and a frame of point cloud can contain millions of points, which also brings great difficulties and challenges to the effective storage and transmission of point clouds. Therefore, compression technology is used to reduce redundant information in point cloud storage, so as to facilitate subsequent processing.

At present, representative point cloud compression algorithms include: Video-based Point Cloud Compression (V-PCC) and Geometry-based Point Cloud Compression (G-PCC). The geometric compression in G-PCC is mainly implemented through octree models and/or triangular surface models. V-PCC is mainly implemented through three-dimensional to two-dimensional projection and video compression. In the above two compression algorithms, the structural information of the real environment expressed by the point cloud is restored by reconstructing the geometric information of the point cloud. Generally, the process of reconstructing the geometric information of the point cloud includes: using a sparse convolutional neural network, taking the voxelized geometric information of the low-scale point cloud as input, and predicting the occupancy probability of each high-scale voxel in the high-scale point cloud. Then, according to the occupancy probability of each high-scale voxel, the occupancy symbol of each high-scale voxel is determined, and the geometric information of the high-scale point cloud is reconstructed according to the occupancy symbol representing the occupied high-scale voxels.

However, the method of geometric reconstruction based on occupancy probability prediction is prone to inaccurate occupancy symbol determination when the occupancy probabilities of multiple high-scale voxels are close or the occupancy probability threshold is set unreasonably, thereby reducing the quality of geometric reconstruction and further reducing the encoding and decoding performance.

Summary of the invention

The embodiments of the present application provide a coding and decoding method, a decoder, an encoder and a computer-readable storage medium, which can improve the quality of geometric reconstruction of point cloud coding and decoding, thereby improving coding and decoding performance.

The technical solution of this application is implemented as follows:

The present application provides a decoding method, including:

Parse the bitstream to determine the encoding information corresponding to the second-scale point cloud, and determine the first-scale point cloud; the first-scale point cloud is the previously decoded point cloud data corresponding to the second-scale point cloud;

Based on the first-scale point cloud, a local density prediction is performed to determine a local density corresponding to a first-scale voxel in the first-scale point cloud, and an occupation probability prediction is performed on a second-scale voxel to determine an occupation probability corresponding to the second-scale voxel; the second-scale voxel is an upsampled voxel corresponding to the first-scale voxel; the local density represents the number of occupied second-scale voxels in the second-scale voxel corresponding to the first-scale voxel;

Based on the occupancy probability corresponding to the second-scale voxel and the local density corresponding to the first-scale voxel, the encoded information corresponding to the second-scale point cloud is decoded and reconstructed to determine the reconstructed geometric information corresponding to the second-scale point cloud.

The present application provides an encoding method, including:

Downsampling the second-scale point cloud to determine the first-scale point cloud, and upsampling the first-scale voxels in the first-scale point cloud to the second scale to determine the second-scale voxels corresponding to the first-scale voxels;

Performing local density prediction based on the first-scale point cloud to determine the local density corresponding to the first-scale voxel, and performing occupancy probability prediction on the second-scale voxel to determine the occupancy probability corresponding to the second-scale voxel; the local density represents the number of occupied second-scale voxels in the second-scale voxels corresponding to the first-scale voxels;

Determining reconstructed geometric information corresponding to the second-scale point cloud according to the local density corresponding to the first-scale voxel and the occupancy probability corresponding to the second-scale voxel;

Encoding is performed based on the reconstructed geometric information corresponding to the second-scale point cloud, encoding information corresponding to the second-scale point cloud is determined, and the encoding information is written into a bitstream.

The present application provides a decoder, including:

A parsing part, configured to parse the bitstream and determine the encoding information corresponding to the second scale point cloud;

The determining part is configured to determine a first-scale point cloud; the first-scale point cloud is the previously decoded point cloud data corresponding to the second-scale point cloud;

A local density prediction part is configured to perform local density prediction based on the first-scale point cloud to determine the local density corresponding to the first-scale voxel in the first-scale point cloud;

The occupancy probability prediction part is configured to perform occupancy probability prediction on a second-scale voxel based on the first-scale point cloud, and determine the occupancy probability corresponding to the second-scale voxel; the second-scale voxel is an upsampled voxel corresponding to the first-scale voxel;

The decoding and reconstruction part is configured to decode and reconstruct the encoded information corresponding to the second-scale point cloud based on the occupancy probability corresponding to the second-scale voxel and the local density corresponding to the first-scale voxel, and determine the reconstructed geometric information corresponding to the second-scale point cloud.

The present application provides an encoder, including:

A downsampling part, configured to perform voxel downsampling on the second scale point cloud to determine a first scale point cloud;

A local density prediction part, configured to perform local density prediction based on the first-scale point cloud to determine the local density corresponding to the first-scale voxel;

The occupancy probability prediction part is configured to upsample the first scale voxels in the first scale point cloud to the second scale, determine the second scale voxels corresponding to the first scale voxels; and perform occupancy probability prediction on the second scale voxels to determine the occupancy probability corresponding to the second scale voxels;

A reconstruction part, configured to determine the reconstructed geometric information corresponding to the second-scale point cloud according to the local density corresponding to the first-scale voxel and the occupancy probability corresponding to the second-scale voxel;

The encoding part is configured to perform encoding based on the reconstructed geometric information corresponding to the second-scale point cloud, determine the encoding information corresponding to the second-scale point cloud, and write the encoding information into a bitstream.

The embodiment of the present application provides a code stream, including:

The code stream is generated by bit encoding according to the coding information; wherein the coding information at least includes: coding information corresponding to the second-scale point cloud.

The present application provides a decoder, including:

A first memory configured to store executable instructions;

The first processor is configured to implement any of the decoding methods described above when executing the executable instructions stored in the first memory.

The present application provides an encoder, including:

a second memory configured to store executable instructions;

The second processor is configured to implement any of the encoding methods described above when executing the executable instructions stored in the second memory.

An embodiment of the present application provides a computer-readable storage medium storing executable instructions for causing a first processor to execute to implement the above-mentioned decoding method, or for causing a second processor to execute to implement the above-mentioned encoding method.

An embodiment of the present application provides a computer program product, including a computer program or instructions. When the computer program or instructions are executed by a first processor, the decoding method provided by the embodiment of the present application is implemented; or, when the computer program or instructions are executed by a second processor, the encoding method provided by the embodiment of the present application is implemented.

The embodiment of the present application provides a coding and decoding method, a decoder, an encoder and a computer-readable storage medium. The decoder can determine the number of occupied second-scale voxels in the second-scale voxels obtained by sampling each first-scale voxel by predicting the local density. In this way, the occupancy probability corresponding to the second-scale voxel can be screened in combination with the local density to determine the occupancy of the second-scale voxel, reconstruct the second-scale point cloud according to the occupancy of the second-scale voxel, and determine the reconstructed geometric information of the second-scale point cloud. In this way, the occupancy of the determined second-scale voxel can be made more accurate, and the accuracy of the reconstructed geometric information of the second-scale point cloud can be improved, that is, the reconstructed geometric quality of the decoder is improved, and the decoding performance is improved. In addition, in the encoder, based on the local density corresponding to the first-scale voxel, the occupancy probability corresponding to the second-scale voxel is screened, which can improve the accuracy of determining the occupancy of the second-scale voxel, and then improve the accuracy of encoding the reconstructed geometric information of the second-scale point cloud determined based on the occupancy of the second-scale voxel, thereby improving the encoding performance.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG1 is a flow chart of G-PCC coding;

FIG2 is a flow chart of G-PCC decoding;

FIG3 is a schematic diagram of an optional flow chart of a decoding method provided in an embodiment of the present application;

FIG4 is a schematic diagram of an optional process of voxel upsampling provided in an embodiment of the present application;

FIG5 is a schematic diagram of an optional flow chart of a decoding method provided in an embodiment of the present application;

FIG6 is a schematic diagram of an optional flow chart of an occupancy probability prediction and local density prediction process provided in an embodiment of the present application;

FIG7 is a schematic diagram of an optional flow chart of an occupancy probability prediction and local density prediction process provided in an embodiment of the present application;

FIG8 is a schematic diagram of an optional flow chart of an occupancy probability prediction and local density prediction process provided in an embodiment of the present application;

FIG9 is a schematic diagram of an optional structure of a local density prediction network provided in an embodiment of the present application;

FIG10 is a schematic diagram of an optional structure of a feature extraction network provided in an embodiment of the present application;

FIG11 is a schematic diagram of an optional structure of a residual layer in a feature extraction network provided in an embodiment of the present application;

FIG12 is a schematic diagram of an optional structure of an occupancy probability prediction network provided in an embodiment of the present application;

FIG13 is a schematic diagram of an occupancy probability corresponding to a second-scale voxel provided in an embodiment of the present application;

FIG. 14 is a schematic diagram of an optional flow chart of a decoding method provided in an embodiment of the present application

FIG15 is a schematic diagram of an optional process for reconstructing point cloud geometric information based on occupancy probability and local density provided in an embodiment of the present application;

FIG16 is a schematic diagram of an optional flow chart of an encoding method provided in an embodiment of the present application;

FIG17 is a schematic diagram of an optional process of voxel downsampling provided in an embodiment of the present application;

FIG18 is a schematic diagram of an occupancy symbol obtained by voxel downsampling according to an embodiment of the present application;

FIG19 is a schematic diagram of an optional process of applying the decoding method provided in an embodiment of the present application to an actual scenario;

FIG20 is a schematic diagram of an optional structure of a decoder provided in an embodiment of the present application;

FIG21 is a schematic diagram of an optional structure of an encoder provided in an embodiment of the present application;

FIG22 is a schematic diagram of an optional structure of a decoder provided in an embodiment of the present application;

FIG. 23 is a schematic diagram of an optional structure of an encoder provided in an embodiment of the present application.

Detailed ways

In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings. The described embodiments should not be regarded as limiting the present application. All other embodiments obtained by ordinary technicians in the field without making creative work are within the scope of protection of this application.

In the following description, reference is made to “some embodiments”, which describe a subset of all possible embodiments, but it will be understood that “some embodiments” may be the same subset or different subsets of all possible embodiments and may be combined with each other without conflict.

In the following description, the terms "first\second\third" involved are merely used to distinguish similar objects and do not represent a specific ordering of the objects. It can be understood that "first\second\third" can be interchanged with a specific order or sequence where permitted, so that the embodiments of the present application described here can be implemented in an order other than that illustrated or described here.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those commonly understood by those skilled in the art to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of this application and are not intended to limit this application.

Before further describing the embodiments of the present application in detail, the nouns and terms involved in the embodiments of the present application are explained. The nouns and terms involved in the embodiments of the present application are subject to the following interpretations.

1) Voxel: Voxel is the abbreviation of volume element, which is the smallest unit of digital data in three-dimensional space segmentation. Voxel can be used to divide 3D space into grids and give each grid feature. For example, voxel can be a cubic block of fixed size in three-dimensional space. Voxel can be widely used in fields such as three-dimensional imaging, scientific data and medical imaging.

Point cloud compression algorithms include: Video-based Point Cloud Compression (V-PCC) and Geometry-based Point Cloud Compression (G-PCC). Among them, the geometry compression in G-PCC is mainly implemented through the octree model and/or the triangle surface model. V-PCC is mainly implemented through 3D to 2D projection and video compression.

With the development of artificial intelligence technology, neural networks are applied to geometry-based point cloud compression technology. The point cloud geometry compression technology based on neural networks can be roughly divided into geometric lossy compression and lossless compression. Among them, the lossless compression algorithm mainly revolves around the design of the prediction model of voxel occupancy probability. The data representation of voxels usually uses octree models, volume models, sparse tensor representations, etc. In the design of the prediction model, for geometric lossless compression, on the encoder side, it is often necessary to use the surrounding context such as parent nodes and neighbor nodes as input, and after processing by the neural network (such as convolution, full connection) layer, output the occupancy probability of each voxel in the geometric data of the point cloud, and then use the entropy encoder to convert the voxel occupancy symbol corresponding to the occupancy probability of each voxel into a bitstream. Correspondingly, on the decoder side, the occupancy probability of each voxel is predicted according to the same process, and the voxel occupancy symbol is decoded from the bitstream based on the predicted occupancy probability to reconstruct the geometric data of the point cloud.

It can be seen that if the occupation symbol of a voxel is determined only based on the occupation probability, it is easy to have inaccurate occupation symbol determination when the occupation probabilities of multiple adjacent voxels are close or the occupation probability threshold is set unreasonably. Especially in the case of uneven point cloud density distribution, it is easy for the occupation symbol to represent the number of occupied voxels to be inconsistent with the actual number of occupied voxels, resulting in more or less points in the reconstructed geometric information. This reduces the quality of geometric reconstruction and further reduces the encoding and decoding performance.

The embodiments of the present application provide a coding and decoding method, a decoder, an encoder and a computer-readable storage medium, which can improve coding and decoding efficiency and improve coding and decoding performance. In order to facilitate the understanding of the technical solution provided by the embodiments of the present application, a flow chart of G-PCC encoding and a flow chart of G-PCC decoding are first provided. It should be noted that the flow chart of G-PCC encoding and the flow chart of G-PCC decoding described in the embodiments of the present application are only for more clearly illustrating the technical solution of the embodiments of the present application, and do not constitute a limitation on the technical solution provided by the embodiments of the present application. It is known to those skilled in the art that with the evolution of point cloud compression technology and the emergence of new business scenarios, the technical solution provided in the embodiments of the present application is also applicable to point cloud coding and decoding architectures similar to G-PCC. The point cloud compressed in the embodiments of the present application can be a point cloud in a video, but is not limited to this.

In the point cloud G-PCC encoder framework, the point cloud of the input 3D image model is sliced and each slice is encoded independently.

As shown in the flowchart of G-PCC coding in FIG1 , it is applied to the encoder. For the point cloud data to be encoded, the point cloud data is first divided into multiple slices by strip division. In each slice, the geometric information and attribute information of the point cloud are encoded separately. In the geometric coding process, the geometric information is transformed so that all the point clouds are contained in a bounding box, and then quantized. Quantization mainly plays a role in scaling. Due to the quantization rounding, the geometric information of a part of the point cloud is the same. It can be determined whether to remove duplicate points based on parameters. The process of quantization and removal of duplicate points is also called voxelization. Then the bounding box is divided into octrees. In the octree-based geometric information coding process, the bounding box is divided into 8 sub-cubes, and the non-empty (containing points in the point cloud) sub-cubes are divided into 8 equal parts until the leaf node obtained by the division is a 1x1x1 unit cube. The division is stopped, and the points in the leaf node are arithmetically encoded to generate a binary geometric bit stream, that is, a geometric code stream. In the process of geometric information encoding based on triangle soup (trisoup), octree division must also be performed first. However, unlike the geometric information encoding based on octree, the trisoup does not need to divide the point cloud into unit cubes with a side length of 1x1x1 step by step. Instead, the division stops when the side length of the sub-block is W. Based on the surface formed by the distribution of the point cloud in each block, the surface and the twelve edges of the block are obtained. At most twelve intersections (vertex) generated by the twelve edges of the block are obtained, and the vertices are arithmetically encoded (surface fitting based on the intersections) to generate a binary geometric bit stream, that is, a geometric code stream. Vertex is also used to implement the process of geometric reconstruction, and the reconstructed geometric information is used when encoding the attributes of the point cloud.

In the attribute encoding process, color conversion is performed to convert the color information (i.e., attribute information) from the RGB color space to the YUV color space. Then, the point cloud is recolored using the reconstructed geometric information so that the unencoded attribute information corresponds to the reconstructed geometric information. In the color information encoding process, there are two main transformation methods. One is the distance-based lifting transformation that relies on the level of detail (LOD) division, and the other is the direct transformation of the region adaptive hierarchical transformation (RAHT). Both methods will convert the color information from the spatial domain to the frequency domain, obtain high-frequency coefficients and low-frequency coefficients through the transformation, and finally quantize the coefficients (i.e., quantized coefficients). Finally, the geometric encoding data after octree division and surface fitting and the attribute encoding data processed by the quantized coefficients are sliced and synthesized, and the vertex coordinates of each block are encoded in turn (i.e., arithmetic encoding) to generate a binary attribute bit stream, i.e., the attribute code stream.

The flowchart of G-PCC decoding shown in Figure 2 is applied to the decoder. The decoder obtains a binary code stream, and independently decodes the geometric bit stream (i.e., geometric code stream) and attribute bit stream in the binary code stream. When decoding the geometric bit stream, the geometric information of the point cloud is obtained through arithmetic decoding-octree synthesis-surface fitting-reconstruction of geometry-inverse coordinate transformation; when decoding the attribute bit stream, the attribute information of the point cloud is obtained through arithmetic decoding-inverse quantization-LOD-based inverse lifting or RAHT-based inverse transformation-inverse color conversion, and the three-dimensional image model of the point cloud data to be encoded is restored based on the geometric information and attribute information.

The encoding method of the embodiment of the present application can be applied to the geometric information encoding process of the G-PCC as shown in Figure 1. After voxelization is completed, the geometric encoding process in Figure 1 is performed based on the voxelized second-scale point cloud to obtain a geometric bit stream; in the reconstruction geometry process of the geometric encoding, the voxel-down sampling is performed on the voxelized second-scale point cloud to determine the first-scale point cloud, and the first-scale voxels in the first-scale point cloud are upsampled to the second scale to determine the second-scale voxels corresponding to the first-scale voxels; local density prediction is performed based on the first-scale point cloud to determine the local density corresponding to the first-scale voxels, and the occupation probability prediction is performed on the second-scale voxels to determine the occupation probability corresponding to the second-scale voxels; the local density represents the number of occupied second-scale voxels in the second-scale voxels corresponding to the first-scale voxels; according to the local density corresponding to the first-scale voxels and the occupation probability corresponding to the second-scale voxels, the reconstructed geometric information corresponding to the second-scale point cloud is determined; in the attribute encoding process, the attribute bit stream is obtained based on the reconstructed geometric information corresponding to the second-scale point cloud.

The decoding method of the embodiment of the present application can be applied to the geometric information decoding process of G-PCC as shown in FIG2. By parsing the bitstream, the coding information corresponding to the second-scale point cloud is determined, and the first-scale point cloud is determined; the first-scale point cloud is the previously decoded point cloud data corresponding to the second-scale point cloud. In the geometric decoding process, the coding information corresponding to the second-scale point cloud is subjected to arithmetic decoding, octree synthesis, and surface fitting. In the geometric reconstruction process of geometric decoding, local density prediction is performed based on the first-scale point cloud to determine the local density corresponding to the first-scale voxel in the first-scale point cloud, and the occupation probability prediction is performed on the second-scale voxel to determine the occupation probability corresponding to the second-scale voxel; the second-scale voxel is the upsampled voxel corresponding to the first-scale voxel; the local density represents the number of occupied second-scale voxels in the second-scale voxel corresponding to the first-scale voxel; based on the occupation probability corresponding to the second-scale voxel and the local density corresponding to the first-scale voxel, the coding information corresponding to the second-scale point cloud is decoded and reconstructed to determine the reconstructed geometric information corresponding to the second-scale point cloud. In the attribute decoding process, the reconstructed geometric information corresponding to the second-scale point cloud is used to perform LOD-based inverse lifting or RAHT-based inverse transformation-inverse color conversion to obtain the attribute information of the second-scale point cloud, and the three-dimensional image model of the second-scale point cloud is restored based on the reconstructed geometric information and attribute information.

It should be noted that the encoding method and decoding method of the embodiments of the present application can also be used in other point cloud encoding and decoding processes besides G-PCC.

The decoding method applied to a decoder provided in an embodiment of the present application is described below.

Refer to Figure 3, which is an optional flowchart of a decoding method provided in an embodiment of the present application, which will be explained in conjunction with the steps shown in Figure 3.

S101, parsing a bitstream, determining coding information corresponding to a second-scale point cloud, and determining a first-scale point cloud.

In the embodiment of the present application, the decoder parses the received code stream to obtain the encoding information corresponding to the second scale point cloud and determines the first scale point cloud, wherein the first scale point cloud is the previously decoded point cloud data corresponding to the second scale point cloud.

In the embodiment of the present application, the code stream generally includes coding information corresponding to at least one scale of point cloud sent by the encoder. When decoding, the decoder decodes in order from low scale to high scale. That is, before the decoder decodes the coding information corresponding to the second scale point cloud, it has completed the decoding of the coding information of the previous scale point cloud of the second scale point cloud, that is, the first scale point cloud, and determined the decoded first scale point cloud data, that is, the first scale point cloud. In this way, the decoder can use the decoded low-scale first-scale point cloud data to decode the coding information corresponding to the high-scale second-scale point cloud and reconstruct the geometric information.

S102: performing local density prediction based on the first-scale point cloud to determine the local density corresponding to the first-scale voxels in the first-scale point cloud, and performing occupancy probability prediction on the second-scale voxels to determine the occupancy probability corresponding to the second-scale voxels.

In the embodiment of the present application, the geometric information of the first-scale point cloud and the second-scale point cloud has undergone a voxelization process of the encoder and can be represented in the form of a voxel grid.

In the embodiment of the present application, for the voxelization process, a point in the point cloud may correspond to an occupied voxel (i.e., a non-empty voxel), and an unoccupied voxel (i.e., an empty voxel) indicates that there is no point in the point cloud at the voxel position. In some embodiments, the occupied voxels may be marked as 1, and the unoccupied voxels may be marked as 0. In this way, the voxelized point cloud may represent the geometric data of the point cloud by the occupation symbols of the voxels at each position in the voxel grid.

In the embodiment of the present application, the scale of the point cloud corresponds to the scale of the voxels in the point cloud. That is, the voxels contained in the first-scale point cloud are first-scale voxels, and the voxels contained in the second-scale point cloud are second-scale voxels. Among them, the second-scale voxels are upsampled voxels corresponding to the first-scale voxels. The decoder can perform voxel upsampling on the first-scale voxels in the first-scale point cloud. For example, the first-scale voxels representing the occupied points in the first-scale point cloud are upsampled to obtain multiple second-scale voxels corresponding to each first-scale voxel.

In some embodiments, the decoder can implement voxel upsampling by pooling, such as using a maximum pooling layer with a step size of 2×2×2 to divide one first-scale voxel in the first-scale point cloud into eight second-scale voxels. Each upsampling evenly divides the size of the first-scale voxel in three dimensions, that is, the size of the second-scale voxel in three dimensions is half of the first-scale voxel. Upsampling each first-scale voxel in the first-scale point cloud completes the upsampling from the low-scale point cloud with known geometric information to the high-scale point cloud, and obtains the second-scale point cloud whose geometric information is to be reconstructed.

Please refer to Figure 4, which shows a first-scale point cloud containing 2×2×1 first-scale voxels. After one voxel upsampling, a second-scale point cloud containing 4×4×2 second-scale voxels is obtained. The first-scale point cloud is the decoded point cloud data. In Figure 4, the occupied first-scale voxels are represented by solid cubes, which represent the locations of the points in the point cloud. After the first-scale voxels are upsampled to second-scale voxels, whether the second-scale voxels are occupied needs to be further determined through a geometric reconstruction process, which is represented by empty cubes in Figure 4. However, the point cloud in Figure 4 is only exemplary, and the actual point cloud may include more voxels.

In the embodiment of the present application, the local density represents the number of occupied second-scale voxels in the second-scale voxels corresponding to the first-scale voxels. Therefore, the decoder performs local density prediction based on the first-scale point cloud, determines the local density corresponding to the first-scale voxels in the first-scale point cloud, and can determine the number of occupied second-scale voxels in the multiple second-scale voxels corresponding to each first-scale voxel. In this way, the occupied second-scale voxels in the multiple second-scale voxels can be more accurately determined by combining the multiple occupation probabilities corresponding to the multiple second-scale voxels.

In the embodiment of the present application, the decoder predicts the occupancy probability of the second-scale voxels obtained by upsampling the first-scale voxels, and determines multiple occupancy probabilities corresponding to multiple second-scale voxels corresponding to the first-scale voxels.

In some embodiments, based on FIG. 3 , as shown in FIG. 5 , S102 may be implemented by executing the process of S201 - S203 as follows:

S201 : extracting features from geometric information of a first-scale point cloud to determine features of the first-scale point cloud.

S202, upsampling the first-scale point cloud features to the second scale, determining initial second-scale point cloud features, performing feature extraction on the initial second-scale point cloud features, determining second-scale point cloud features, and performing occupancy probability prediction based on the second-scale point cloud features to determine occupancy probabilities corresponding to second-scale voxels.

In an embodiment of the present application, the geometric information of the first scale point cloud may include the occupancy of each voxel in the first scale point cloud and the position information of each voxel. In some embodiments, the occupancy may be a preset flag such as 0 or 1, and the position information may be the three-dimensional coordinates of the voxel. In some embodiments, the geometric information of the first scale point cloud may be obtained by decoding and reconstructing the encoded information of the first scale point cloud, that is, the geometric information of the first scale point cloud may be equivalent to the reconstructed geometric information of the first scale point cloud.

In an embodiment of the present application, the decoder extracts features from the geometric information of the first-scale point cloud, maps the geometric information of the first-scale point cloud to a preset low-scale feature space, and determines the features of the first-scale point cloud. In some embodiments, the first-scale point cloud features may include implicit features extracted from the geometric information of the first-scale point cloud.

In the embodiment of the present application, the decoder upsamples the first-scale point cloud features to the second scale to determine the initial second-scale point cloud features. The decoder extracts features from the initial second-scale point cloud features to determine the second-scale point cloud features. The decoder predicts the occupancy probability of the second-scale voxels based on the second-scale point cloud features, predicts the probability that the voxel position corresponding to each second-scale voxel falls into the midpoint of the point cloud, and determines the occupancy probability corresponding to the second-scale voxel. It can be understood that the second-scale point cloud features are obtained by performing two feature extractions on the geometric information of the first-scale point cloud. Therefore, the second-scale point cloud features have better feature expression, which can improve the accuracy of the decoder's prediction of occupancy probability using the second-scale point cloud features.

S203 , performing local density prediction according to the first-scale point cloud features to determine the local density corresponding to the first-scale voxel.

In the embodiment of the present application, the decoder performs local density prediction based on the first-scale point cloud features, and predicts the number of occupied second-scale voxels in multiple second-scale voxels corresponding to each first-scale voxel in the first-scale point cloud as the local density corresponding to each first-scale voxel. It can be understood that the local density is a numerical value, which can be an integer value greater than or equal to 1 and less than or equal to the total number of second-scale voxels corresponding to a first-scale voxel.

It should be noted that the execution order of the feature upsampling, feature extraction of the initial second-scale point cloud features, and occupancy probability prediction processes in S202 and the local density prediction process in S203 is not limited to that shown in FIG. 5 . In practical applications, they can also be executed in any order or simultaneously, depending on the actual situation, and the embodiment of the present application does not limit this.

Exemplarily, the above process of S201-S203 can be shown in FIG6. The decoder extracts features of the geometric information of the first-scale point cloud through the third feature extraction network to determine the first-scale point cloud features; the first-scale point cloud features are respectively input into the first branch including the upsampling network, the fourth feature extraction network and the occupancy probability prediction network, and the second branch including the local density prediction network; the occupancy probability corresponding to each second-scale voxel is output through the first branch, and the local density corresponding to each first-scale voxel is output through the second branch. In some embodiments, the upsampling network can be implemented by a transposed convolutional network; the third feature extraction network is used to extract features of the geometric information of the first-scale point cloud, and the fourth feature extraction network is used to extract features of the initial second-scale point cloud features; the occupancy probability prediction network and the local density prediction network can be implemented using pre-trained neural networks.

In some embodiments, S102 in FIG. 3 may also be implemented by executing the process of S301-S304 as follows:

S301 . Perform feature extraction on geometric information of a first-scale point cloud through a first feature extraction network to determine first point cloud features of a first scale.

S302: up-sample the first point cloud features at the first scale to the second scale, determine the second-scale point cloud features, and perform occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxels.

S303 . Perform feature extraction on the geometric data of the first-scale point cloud through a second feature extraction network to determine second point cloud features of the first scale.

S304 , performing local density prediction according to the second point cloud features at the first scale, and determining the local density corresponding to the voxels at the first scale.

In an embodiment of the present application, the decoder can also extract different first-scale point cloud features, that is, first point cloud features and second point cloud features, for occupancy probability prediction and local density prediction processes through different feature extraction networks, which are used for occupancy probability prediction and local density prediction, respectively.

In some embodiments, the first feature extraction network can be obtained by jointly learning or training with a neural network for occupancy probability prediction, and the second feature extraction network can be obtained by jointly learning or training with a local density prediction network. In this way, point cloud geometric features for predicting occupancy probability and point cloud geometric features for predicting local density can be extracted more specifically, thereby improving the accuracy of local density prediction and occupancy probability prediction.

Likewise, the embodiment of the present application does not limit the execution order of the S301 - S302 process and the S303 - S304 process.

Exemplarily, the above-mentioned process of S301-S304 can be shown in FIG7. The first branch in FIG7 includes a first feature extraction network, an upsampling network, and an occupancy probability prediction network; the second branch includes a second feature extraction network and a local density prediction network. The decoder inputs the geometric information of the first-scale point cloud into the first branch and the second branch respectively, and performs feature extraction and network prediction on the geometric information of the first-scale point cloud through the first branch and the second branch respectively; the occupancy probability corresponding to each second-scale voxel is output through the first branch, and the local density corresponding to each first-scale voxel is output through the second branch.

It should be noted that, in some embodiments, based on FIG. 7 , after the upsampling network and before the occupancy probability prediction network, the second-scale point cloud features output by the upsampling network may be further extracted, and the second-scale point cloud features after further feature extraction may be input into the occupancy probability prediction network for occupancy probability prediction to enhance feature expression, thereby improving the accuracy of occupancy probability prediction. The specific selection is made according to the actual situation, and the embodiments of the present application are not limited thereto.

In some embodiments, S102 in FIG. 3 may also be implemented by executing the process of S401-S403 as follows:

S401 : extracting features from geometric information of a first-scale point cloud to determine features of the first-scale point cloud.

S402: up-sample the first-scale point cloud features to a second scale, determine the second-scale point cloud features, and perform occupation probability prediction based on the second-scale point cloud features to determine the occupation probability corresponding to the second-scale voxels.

S403 , performing local density prediction according to the first-scale point cloud features to determine the local density corresponding to the first-scale voxel.

In the embodiment of the present application, the occupancy probability prediction and the local density prediction can be performed based on the first scale point cloud features determined by feature extraction of the geometric information of the first scale point cloud. In this way, the feature information can be reused, the module complexity can be reduced, and the processing efficiency of the decoder can be improved.

Likewise, the embodiment of the present application does not limit the execution order of S402 and S403.

Exemplarily, the above S401-S403 process can be shown in FIG8. The decoder extracts features from the geometric information of the first-scale point cloud through the fifth feature extraction network to determine the first-scale point cloud features; the first-scale point cloud features are respectively input into the first branch including the upsampling network and the occupancy probability prediction network, and the second branch including the local density prediction network; the occupancy probability corresponding to each second-scale voxel is output through the first branch, and the local density corresponding to each first-scale voxel is output through the second branch. Here, the fifth feature extraction network in FIG8 and the third feature extraction network in FIG6 can be the same or different feature extraction networks.

In some embodiments, for the local density prediction process in S203, S304, and S403 above, a local density prediction network can be used to perform local density prediction based on the first-scale point cloud features to determine the local density corresponding to the first-scale voxel. Wherein, the local density prediction network may include: a first sparse convolution layer, a first activation function layer, a second sparse convolution layer, and a second activation function layer. Exemplarily, the local density prediction network may be shown in FIG9, including: a first layer of sparse convolution layer (i.e., a first sparse convolution layer), a second layer of ReLu activation function layer (i.e., a first activation function layer), a third layer of sparse convolution layer (i.e., a second sparse convolution layer), and a fourth layer of Sigmoid function layer (i.e., a second activation function layer). The local density prediction network is used to output local density according to implicit features of the point cloud.

In some embodiments, the feature extraction process in S201, S202, S301, S303, S401, and S402 can be implemented using a feature extraction network as shown in FIG10, including: a sparse convolution layer of the first layer, an activation function layer of the second layer (e.g., a ReLu activation function layer), a residual layer of the third layer, and a sparse convolution layer of the fourth layer. In some embodiments, the network structure of the residual layer of the third layer can be as shown in FIG11.

In some embodiments, the occupancy probability prediction process in S202, S302, and S402 can be implemented using an occupancy probability prediction network as shown in FIG12, including: a sparse convolution layer of the first layer, an activation function ReLu layer of the second layer, a sparse convolution layer of the third layer, an activation function ReLu of the fourth layer, a sparse convolution layer of the fifth layer, and a Sigmoid function layer of the sixth layer. The occupancy probability prediction network is used to perform occupancy prediction on each second-scale voxel obtained by upsampling each first-scale voxel according to the implicit features of the point cloud of the second scale, and obtain the occupancy probability corresponding to each second-scale voxel. The occupancy probability prediction network in FIG12 is only an example. In actual applications, it can also be a convolutional neural network (CNN) of other hierarchical structures. The specific selection is made according to the actual situation, and the embodiments of the present application are not limited thereto.

In the embodiment of the present application, during the processing of the occupancy probability prediction network, for the unoccupied first-scale voxels in the low-scale point cloud (i.e., the first-scale point cloud), the multiple high-scale second-scale voxels obtained by upsampling are also unoccupied, and no probability prediction is required. However, the multiple second-scale voxels obtained by upsampling the occupied first-scale voxels in the first-scale point cloud need to be predicted. For example, the occupancy probability corresponding to the second-scale voxels facing the side of the paper in the second-scale point cloud in Figure 4 can be shown in Figure 13. The occupancy probability in Figure 13 is only for convenience of explanation and cannot be understood as the result of actual calculation. It can be seen that if the occupancy probability predicted for the actually occupied second-scale voxels is closer to 1, and the occupancy probability predicted for the actually unoccupied second-scale voxels is closer to 0, the more accurate the prediction is, and the more accurate the reconstruction quality of the point cloud geometric information is.

S103 . Decode and reconstruct the encoded information corresponding to the second-scale point cloud based on the occupancy probability corresponding to the second-scale voxel and the local density corresponding to the first-scale voxel to determine the reconstructed geometric information corresponding to the second-scale point cloud.

In the embodiment of the present application, the decoder uses the predicted local density corresponding to each first-scale voxel in the first-scale point cloud and the occupancy probability corresponding to each second-scale voxel obtained by upsampling each first-scale voxel to determine whether each second-scale voxel is occupied, and then, according to whether each second-scale voxel is occupied, decodes and reconstructs the encoded information corresponding to the second-scale point cloud, decodes the position information corresponding to each occupied second-scale voxel in the second-scale point cloud, and determines the reconstructed geometric information corresponding to the second-scale point cloud according to the position information corresponding to each occupied second-scale voxel in the second-scale point cloud.

In some embodiments, based on FIG. 3 or FIG. 5 , as shown in FIG. 14 , S103 may be implemented by executing the process of S1031 - S1032 as follows:

S1031 . For each first-scale voxel in the first-scale point cloud, determine second-scale voxels with a high local density of occupation probability among multiple second-scale voxels corresponding to each first-scale voxel as occupied second-scale voxels.

In the embodiment of the present application, since the local density represents the number of occupied second scale voxels in the second scale voxels corresponding to the first scale voxel, the multiple occupation probabilities of the multiple second scale voxels corresponding to each first scale voxel can be screened according to the local density corresponding to each first scale voxel, and the local density second scale voxels with high occupation probability are determined as the occupied second scale voxels.

For example, taking a first-scale voxel upsampled to obtain 8 second-scale voxels as an example, the local density corresponding to the first-scale voxel can be any value from 1 to 8. Taking the local density of 4 as an example, the decoder determines the 4 second-scale voxels with high occupancy probability among the 8 second-scale voxels as occupied second-scale voxels, and the other second-scale voxels are determined as unoccupied second-scale voxels. As shown in FIG15 , a first-scale voxel in the first-scale point cloud is upsampled to obtain 8 second-scale voxels, and the local density corresponding to the first-scale voxel is 4; the occupancy probabilities corresponding to the 8 second-scale voxels are, from high to low, [0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2]. Then, according to the local density, the second-scale voxels with occupancy probabilities of [0.9, 0.8, 0.7, 0.6] can be determined as occupied second-scale voxels; and the other second-scale voxels corresponding to [0.5, 0.4, 0.3, 0.2] are determined as unoccupied second-scale voxels.

S1032. Based on the occupied second-scale voxels corresponding to each first-scale voxel, decode and reconstruct the encoded information corresponding to the second-scale point cloud to determine the reconstructed geometric information corresponding to the second-scale point cloud.

In the embodiment of the present application, the occupied second-scale voxels represent points in the point cloud that fall into the second-scale voxels obtained by upsampling the first-scale voxels. The decoder can decode and reconstruct the encoded information corresponding to the second-scale point cloud based on the occupied second-scale voxels corresponding to each first-scale voxel, thereby determining the reconstructed geometric information corresponding to the second-scale point cloud.

In some embodiments, the decoder may distinguish between occupied second-scale voxels and unoccupied second-scale voxels with different occupancy symbols according to the determination result of S1031. The encoded information corresponding to the second-scale point cloud may be decoded and reconstructed based on each occupied second-scale voxel represented by the occupancy symbol, and the reconstructed geometric information corresponding to the second-scale point cloud may be determined.

Here, when the decoder performs geometric reconstruction, it mainly decodes and reconstructs the geometric coding information in the coding information corresponding to the second scale point cloud. After determining the reconstructed geometric information corresponding to the second scale point cloud, the attribute coding information corresponding to the second scale point cloud can be decoded based on the reconstructed geometric information corresponding to the second scale point cloud through the attribute decoding process to determine the attribute information corresponding to the second scale point cloud, thereby combining the reconstructed geometric information and the attribute information corresponding to the second scale point cloud to restore the three-dimensional image model of the second scale point cloud.

It is understandable that in the embodiment of the present application, the decoder can determine the number of occupied second-scale voxels in the second-scale voxels obtained by sampling each first-scale voxel by predicting the local density. In this way, the occupancy probability corresponding to the second-scale voxels can be screened in combination with the local density to determine the occupancy of the second-scale voxels, reconstruct the second-scale point cloud according to the occupancy of the second-scale voxels, and determine the reconstructed geometric information of the second-scale point cloud. In this way, the occupancy of the determined second-scale voxels can be made more accurate, and the accuracy of the reconstructed geometric information of the second-scale point cloud can be improved, that is, the reconstructed geometric quality of the decoder is improved, and the decoding performance is improved.

In some embodiments, after S103, the decoder may continue to parse the code stream to determine the encoding information corresponding to the i-th scale point cloud, where i is an integer greater than or equal to 3; through the above decoding method in the embodiment of the present application, the local density of the i-th scale voxel is predicted based on the reconstructed geometric information of the i-th scale point cloud, the local density corresponding to the i-1 scale voxel is determined, and the occupancy probability of the i-th scale voxel corresponding to the i-1 scale voxel is predicted to determine the occupancy probability corresponding to the i-th scale voxel; the i-th scale is obtained by upsampling the i-th scale; based on the occupancy probability corresponding to the i-th scale voxel and the local density corresponding to the i-th scale voxel, the encoding information corresponding to the i-th scale point cloud is decoded and reconstructed to determine the reconstructed geometric information corresponding to the i-th scale point cloud. That is, after decoding and reconstructing the second scale point cloud, the decoder may also decode and reconstruct the encoding information of a higher scale point cloud in the code stream, such as a third scale point cloud, based on the reconstructed geometric information of the second scale point cloud, using the decoding method in the embodiment of the present application to obtain the reconstructed geometric information corresponding to the third scale point cloud. In this way, when the decoder decodes adjacent scales, it can use the known geometric data of the previously decoded low-scale point cloud to decode and reconstruct the encoded information of the high-scale point cloud, and determine the reconstructed geometric information of the high-scale point cloud; by decoding scale by scale until the reconstructed geometric information of the target scale point cloud is restored, the target scale can be determined according to the preset decoding accuracy of the decoder.

In some embodiments, after S103, the decoder may also perform the above-mentioned decoding method in the embodiments of the present application to predict the local density of the n-th scale voxel and the occupancy probability of the n+1-th scale voxel based on the reconstructed geometric data corresponding to the n-th scale point cloud, and determine the local density corresponding to the n-th scale voxel and the occupancy probability corresponding to the n+1-th scale voxel; n is a positive integer greater than or equal to 2; the n+1-th scale is obtained by sampling the n-th scale; based on the local density corresponding to the n-th scale voxel and the occupancy probability corresponding to the n+1-th scale voxel, determine the reconstructed geometric data corresponding to the n+1-th scale point cloud.

Exemplarily, for the coding information of point clouds of multiple non-adjacent scales contained in the code stream, such as the coding information corresponding to the first-scale point cloud, the coding information corresponding to the third-scale point cloud, and the coding information corresponding to the fifth-scale point cloud, etc., the decoder can perform voxel upsampling based on the first-scale voxels in the first-scale point cloud to obtain the second-scale voxels on the basis of decoding and reconstructing the first-scale point cloud; and extract features from the geometric information of the first-scale point cloud, perform local density prediction on the first-scale voxels, and perform occupation probability prediction on the second-scale voxels, and reconstruct the reconstructed geometric data corresponding to the second-scale point cloud according to the determined local density of the first-scale voxels and the occupation probability of the second-scale voxels, combined with the second position information of the second-scale voxels determined by upsampling the first position information of the first-scale voxels. The reconstructed geometric data corresponding to the second-scale point cloud is used to decode and reconstruct the coding information corresponding to the third-scale point cloud, and so on.

It should be noted that the decoding method in the embodiment of the present application can be applied to a scalable encoding and decoding method, that is, for multiple encoding information of multiple scale point clouds sent by the encoder side, the decoder can decode and reconstruct the point cloud of any scale in the order of decoding from low scale to high scale according to the actual decoding accuracy requirements. Exemplarily, the encoder writes and sends the encoding information of the first scale point cloud, the encoding information of the second scale point cloud to the encoding information of the fifth scale point cloud in the code stream, and the decoder can decode from the first scale point cloud to the third scale point cloud according to the preset accuracy requirements according to the decoding method provided in the embodiment of the present application, reconstruct the geometric data of the third scale point cloud and restore the three-dimensional image model of the third scale point cloud, and then end the decoding, and no longer continue to decode the encoding information corresponding to the fifth scale point cloud. The specific selection is based on the actual situation, and the embodiment of the present application is not limited.

It can be understood that the decoding method provided in the embodiment of the present application can be repeatedly applied between multiple adjacent scales, and the decoding between each group of adjacent scales is independent of each other, so scale-scalable decoding can be flexibly implemented.

It should be noted that each decoding process of the above decoder uses the decoded low-scale point cloud as known information to decode the encoded information of the high-scale point cloud. For the first decoding process of the decoder, the known information may be a preset number of unencoded point cloud information sent by the encoder side. The encoder may send a preset number of point cloud information, such as the coordinates of 100 points in the point cloud, as the first known information directly to the decoding end in an unencoded manner, so that the decoder does not need to decode the first known information, but directly uses the position information of the preset number of points sent by the encoder to reconstruct the point cloud of the corresponding scale to continue the subsequent decoding process.

The following describes an encoding method applied to an encoder provided in an embodiment of the present application.

Refer to Figure 16, which is an optional flow chart of the encoding method provided in an embodiment of the present application, which will be explained in conjunction with the steps shown in Figure 16.

S501 , down-sampling the second-scale point cloud to determine the first-scale point cloud, and up-sampling the first-scale voxels in the first-scale point cloud to the second scale to determine the second-scale voxels corresponding to the first-scale voxels.

In the embodiment of the present application, the encoder voxelizes the original point cloud data of the second scale point cloud, and the voxelized point cloud can represent the geometric data of the point cloud by the occupation symbol of the voxel at each position in the voxel grid. The encoder performs voxel downsampling on the voxelized second scale point cloud to determine the first scale point cloud.

In some embodiments, the encoder can implement voxel downsampling by pooling. As shown in FIG17, a maximum pooling layer with a step size of 2×2×2 is used to merge 8 second-scale voxels into 1 first-scale voxel. After voxel downsampling, 3 of the 4 first-scale voxels corresponding to the first-scale point cloud are occupied, and 1 first-scale voxel is not occupied. The size of the first-scale voxel in three dimensions is twice that of the second-scale voxel. The encoder marks the occupancy of the voxel with an occupancy symbol. For example, the occupancy symbol corresponding to the second-scale voxel on the side of the second-scale point cloud facing the paper surface is obtained by voxel upsampling. The process of the occupancy symbol corresponding to the first-scale voxel on the side of the first-scale point cloud facing the paper surface can be shown in FIG18. In this way, by voxel downsampling, the geometric data of the first-scale point cloud of relatively low scale is obtained.

In the embodiment of the present application, the encoder then upsamples the first-scale point cloud to the second scale, determines a plurality of second-scale voxels corresponding to each first-scale voxel, and implements a lossless encoding process by predicting the occupancy of the second-scale voxels.

S502 , performing local density prediction based on the first-scale point cloud to determine the local density corresponding to the first-scale voxel, and performing occupancy probability prediction on the second-scale voxel to determine the occupancy probability corresponding to the second-scale voxel.

In the embodiment of the present application, the local density represents the number of occupied second-scale voxels in the second-scale voxels corresponding to the first-scale voxels. The encoder performs local density prediction based on the first-scale point cloud, determines the local density corresponding to the first-scale voxels, and predicts the occupancy probability of the second-scale voxels. The process of determining the occupancy probability corresponding to the second-scale voxels is the same as the same processing method in the decoder, and will not be repeated here.

In some embodiments, the process of S502 may include: performing feature extraction on the geometric information of the first-scale point cloud to determine the first-scale point cloud features; upsampling the first-scale point cloud features to the second scale to determine the initial second-scale point cloud features, performing feature extraction on the initial second-scale point cloud features to determine the second-scale point cloud features, and performing occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel; performing local density prediction based on the first-scale point cloud features to determine the local density corresponding to the first-scale voxel.

Alternatively, in some embodiments, the process of S502 may also include: performing feature extraction on the geometric information of the first-scale point cloud through a first feature extraction network to determine the first point cloud features of the first scale; upsampling the first point cloud features of the first scale to a second scale to determine the second-scale point cloud features, and performing occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel; performing feature extraction on the geometric data of the first-scale point cloud through a second feature extraction network to determine the second point cloud features of the first scale; performing local density prediction based on the second point cloud features of the first scale to determine the local density corresponding to the first-scale voxel.

Alternatively, in some embodiments, the process of S502 may also include: extracting features from the geometric information of the first-scale point cloud to determine the first-scale point cloud features; upsampling the first-scale point cloud features to the second scale to determine the second-scale point cloud features, and performing occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel; performing local density prediction based on the first-scale point cloud features to determine the local density corresponding to the first-scale voxel.

In the above-mentioned local density prediction process, the encoder can use the local density prediction network to perform local density prediction according to the first-scale point cloud features to determine the local density corresponding to the first-scale voxel; wherein the local density prediction network includes: a first sparse convolution layer, a first activation function layer, a second sparse convolution layer and a second activation function layer.

The above processing is consistent with the description of the same processing of the decoder, and will not be repeated here.

S503 : Determine the reconstructed geometric information corresponding to the second-scale point cloud according to the local density corresponding to the first-scale voxel and the occupancy probability corresponding to the second-scale voxel.

In the embodiment of the present application, the encoder determines the reconstructed geometric information corresponding to the second-scale point cloud based on the local density corresponding to the first-scale voxel and the occupancy probability corresponding to the second-scale voxel, which is consistent with the description of the same processing process of the decoder and is not repeated here.

In some embodiments, S503 may be implemented by executing the process of S5031-S5032 as follows:

S5031. For each first-scale voxel in the first-scale point cloud, determine the second-scale voxels with high local density of occupation probability among the multiple second-scale voxels corresponding to each first-scale voxel as occupied second-scale voxels;

S5032: Determine reconstructed geometric information corresponding to the second-scale point cloud based on the occupied second-scale voxels corresponding to each first-scale voxel.

Here, the process of S5031-S5032 is consistent with the above-mentioned process description of S1031-S1032, and will not be repeated here.

S504 , encoding is performed based on the reconstructed geometric information corresponding to the second-scale point cloud, encoding information corresponding to the second-scale point cloud is determined, and the encoding information is written into a bitstream.

In an embodiment of the present application, the encoder can perform recoloring processing based on the reconstructed geometric information corresponding to the second-scale point cloud to obtain colored point cloud data, perform color information encoding based on the colored point cloud data, and determine the attribute information encoding corresponding to the second-scale point cloud; perform geometric information encoding on the point cloud data of the second-scale point cloud to determine the geometric information encoding corresponding to the second-scale point cloud; and combine the geometric information encoding and the attribute information encoding to determine the encoding information corresponding to the second-scale point cloud.

In some embodiments, entropy coding may use a context-based adaptive binary arithmetic coding (CABAC: Context-based Adaptive Binary Arithmetic Coding) algorithm, but is not limited thereto. According to the principle of entropy coding, the more accurate the prediction of the occupancy of voxels, the smaller the information entropy, and the more the actual bit rate and bandwidth are saved. In this way, the encoder writes the code corresponding to the second-scale point cloud into the bitstream and sends it to the decoder, which parses the encoding information corresponding to the second-scale point cloud. The decoding method provided in the embodiment of the present application is used to send the geometric data of the previously decoded low-scale point cloud (such as the first-scale point cloud) to the entropy decoder, so that the geometric data of the lossless second-scale point cloud can be reconstructed, that is, the reconstructed geometric information corresponding to the second-scale point cloud is determined, and then the geometric decoding and attribute decoding of the encoding information of the second-scale point cloud are realized based on the reconstructed geometric information corresponding to the second-scale point cloud, and the three-dimensional image model of the second-scale point cloud is restored.

It can be understood that in the embodiment of the present application, based on the local density corresponding to the first-scale voxels, the occupancy probability corresponding to the second-scale voxels is screened, which can improve the accuracy of determining the occupancy status of the second-scale voxels, and then improve the accuracy of encoding the reconstructed geometric information of the second-scale point cloud determined based on the occupancy status of the second-scale voxels, thereby improving the encoding performance.

In some embodiments, the encoder can also perform at least one voxel downsampling based on the first-scale point cloud through the same encoding process, determine a point cloud with a lower scale than the first-scale point cloud, and complete the encoding of multiple-scale point clouds scale by scale. The encoder writes the encoding information of the multiple-scale point clouds into the bitstream and sends it to the decoder.

It can be understood that the encoding method provided in the embodiment of the present application can be repeatedly applied between multiple adjacent scales, and the encoding between each group of adjacent scales is independent of each other, so scale-scalable encoding can be flexibly implemented.

Below, in conjunction with Figure 19, the application of the decoding method provided in an embodiment of the present application in an actual scenario is explained.

As shown in FIG19 , the encoder performs G-PCC encoding on the original point cloud data of the first scale point cloud to obtain the encoding information corresponding to the first scale point cloud, and sends it to the decoder through the code stream. The decoder performs G-PCC decoding on the encoding information corresponding to the first scale point cloud data, and determines the reconstructed geometric information of the first scale point cloud through the geometric decoding process in G-PCC decoding. The geometric decoding process in G-PCC decoding is used to illustrate that the decoder extracts features from the reconstructed geometric information of the first scale point cloud to obtain first scale point cloud features, and upsamples the first scale point cloud features to the second scale by a 2×2×2 transposed convolution to determine the initial second scale point cloud features; the decoder extracts features from the initial second scale point cloud features to determine the second scale point cloud features. The decoder predicts the occupancy probability of multiple second scale voxels obtained by upsampling each first scale voxel in the first scale point cloud based on the second scale point cloud features, and obtains multiple occupancy probabilities corresponding to the multiple second scale voxels; and predicts the local density of each first scale voxel based on the first scale point cloud features to obtain the local density corresponding to each first scale voxel. During the reconstruction process, the decoder screens the multiple occupancy probabilities corresponding to the multiple second-scale voxels sampled from the first-scale voxel according to the local density corresponding to each first-scale voxel; illustratively, the multiple occupancy probabilities are sorted from high to low, and the local density occupancy probabilities with high occupancy probabilities are determined, and the local density occupancy probabilities corresponding to the local density occupancy probabilities with high occupancy probabilities are determined as the local density occupies the second-scale voxels. The reconstructed geometric information of the second-scale point cloud is determined according to the local density occupies the second-scale voxels corresponding to each first-scale voxel in the first-scale point cloud. Based on the reconstructed geometric information of the second-scale point cloud, the reconstructed geometric information of the third-scale point cloud is reconstructed by the same process.

It can be understood that the embodiment of the present application has good geometric reconstruction quality and better encoding and decoding performance than the traditional G-PCC method. In some embodiments, the applicant conducted a comparative encoding and decoding test on multiple point cloud data sets using the encoding and decoding method of the embodiment of the present application and the traditional G-PCC method. The results are shown in Table 1, as follows:

Table 1

点云数据集Point cloud dataset	本申请实施例相较于传统G-PCC的率失真性能增益The rate-distortion performance gain of the embodiment of the present application compared to the traditional G-PCC
facade_00009_vox12facade_00009_vox12	-25.91％-25.91%
house_without_roof_00057_vox12house_without_roof_00057_vox12	-43.11％-43.11%
boxer_viewdep_vox12boxer_viewdep_vox12	-42.76％-42.76%
soldier_viewdep_vox12soldier_viewdep_vox12	-45.60％-45.60%
平均增益Average gain	-39.35％-39.35%

In Table 1, facade_00009_vox12, house_without_roof_00057_vox12, boxer_viewdep_vox12 and soldier_viewdep_vox12 are different point cloud data sets, and BD-rate gain over. The rate-distortion (Bjontegaard-Delta, BD-rate) performance gain value is negatively correlated with the codec performance. The smaller the gain, such as the smaller the value represented by a negative value, the better the codec performance. It can be seen that compared with the traditional G-PCC codec method, the embodiment of the present application has a higher BD-rate gain, with a maximum increase of 45.60% and an average increase of 39.35. This data illustrates the improvement of codec performance.

The embodiment of the present application provides a decoder 1, as shown in FIG20 , including:

The parsing part 11 is configured to parse the bit stream and determine the encoding information corresponding to the second scale point cloud;

The determining part 12 is configured to determine a first scale point cloud; the first scale point cloud is the previously decoded point cloud data corresponding to the second scale point cloud;

The first prediction part 13 is configured to perform local density prediction based on the first scale point cloud, determine the local density corresponding to the first scale voxel in the first scale point cloud, and perform occupation probability prediction on the second scale voxel to determine the occupation probability corresponding to the second scale voxel; the second scale voxel is an upsampled voxel corresponding to the first scale voxel; the local density represents the number of occupied second scale voxels in the second scale voxel corresponding to the first scale voxel;

The decoding and reconstruction part 14 is configured to decode and reconstruct the encoded information corresponding to the second scale point cloud based on the occupancy probability corresponding to the second scale voxel and the local density corresponding to the first scale voxel, and determine the reconstructed geometric information corresponding to the second scale point cloud.

In some embodiments, the decoding and reconstruction part 14 is further configured to, for each first-scale voxel in the first-scale point cloud, determine the second-scale voxels with high local density of occupation probability among the multiple second-scale voxels corresponding to each first-scale voxel as occupied second-scale voxels; based on the occupied second-scale voxels corresponding to each first-scale voxel, decode and reconstruct the encoded information corresponding to the second-scale point cloud to determine the reconstructed geometric information corresponding to the second-scale point cloud.

In some embodiments, the first prediction part 13 is further configured to perform feature extraction on the geometric information of the first-scale point cloud to determine the first-scale point cloud features; upsample the first-scale point cloud features to the second scale to determine the initial second-scale point cloud features, perform feature extraction on the initial second-scale point cloud features to determine the second-scale point cloud features, and perform occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel; perform local density prediction based on the first-scale point cloud features to determine the local density corresponding to the first-scale voxel.

In some embodiments, the first prediction part 13 is further configured to perform feature extraction on the geometric information of the first-scale point cloud through a first feature extraction network to determine the first point cloud features of the first scale; upsample the first point cloud features of the first scale to a second scale to determine the second-scale point cloud features, and perform occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel; perform feature extraction on the geometric data of the first-scale point cloud through a second feature extraction network to determine the second point cloud features of the first scale; perform local density prediction based on the second point cloud features of the first scale to determine the local density corresponding to the first-scale voxel.

In some embodiments, the first prediction part 13 is further configured to perform feature extraction on the geometric information of the first-scale point cloud to determine the first-scale point cloud features; upsample the first-scale point cloud features to the second scale to determine the second-scale point cloud features, and perform occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel; perform local density prediction based on the first-scale point cloud features to determine the local density corresponding to the first-scale voxel.

In some embodiments, the first prediction part 13 is further configured to use a local density prediction network to perform local density prediction according to the first-scale point cloud features to determine the local density corresponding to the first-scale voxel.

In some embodiments, the local density prediction network includes:

The first sparse convolution layer, the first activation function layer, the second sparse convolution layer and the second activation function layer.

In some embodiments, the parsing part 11 is further configured to parse the code stream to determine the encoding information corresponding to the i-th scale point cloud; i is an integer greater than or equal to 3; the first prediction part 13 is further configured to perform local density prediction of the i-1th scale voxel based on the reconstructed geometric information of the i-1th scale point cloud, determine the local density corresponding to the i-1 scale voxel, and perform occupancy probability prediction on the i-th scale voxel corresponding to the i-1 scale voxel to determine the occupancy probability corresponding to the i-th scale voxel; the i-th scale is obtained by sampling the i-1th scale; the decoding and reconstruction part 14 is further configured to decode and reconstruct the encoding information corresponding to the i-th scale point cloud based on the occupancy probability corresponding to the i-th scale voxel and the local density corresponding to the i-1th scale voxel, and determine the reconstructed geometric information corresponding to the i-th scale point cloud.

In some embodiments, the first prediction part 13 is further configured to perform local density prediction of the n-th scale voxel and occupancy probability prediction of the n+1-th scale voxel based on the reconstructed geometric data corresponding to the n-th scale point cloud, and determine the local density corresponding to the n-th scale voxel and the occupancy probability corresponding to the n+1-th scale voxel; n is a positive integer greater than or equal to 2; the n+1-th scale is obtained by upsampling the n-th scale; the decoding and reconstruction part 14 is further configured to determine the reconstructed geometric data corresponding to the n+1-th scale point cloud based on the local density corresponding to the n-th scale voxel and the occupancy probability corresponding to the n+1-th scale voxel.

The embodiment of the present application provides an encoder 2, as shown in FIG21, including:

A downsampling part 21 is configured to perform voxel downsampling on the second scale point cloud to determine a first scale point cloud;

The second prediction part 22 is configured to perform local density prediction based on the first scale point cloud to determine the local density corresponding to the first scale voxel, and perform occupation probability prediction on the second scale voxel to determine the occupation probability corresponding to the second scale voxel; the local density represents the number of occupied second scale voxels in the second scale voxels corresponding to the first scale voxels;

A reconstruction part 23, configured to determine the reconstructed geometric information corresponding to the second-scale point cloud according to the local density corresponding to the first-scale voxel and the occupancy probability corresponding to the second-scale voxel;

The encoding part 24 is configured to perform encoding based on the reconstructed geometric information corresponding to the second-scale point cloud, determine the encoding information corresponding to the second-scale point cloud, and write the encoding information into a bitstream.

In some embodiments, the reconstruction part 23 is further configured to, for each first-scale voxel in the first-scale point cloud, determine a number of second-scale voxels with a high local density of occupation probability among multiple second-scale voxels corresponding to each first-scale voxel as occupied second-scale voxels; and determine the reconstructed geometric information corresponding to the second-scale point cloud based on the occupied second-scale voxels corresponding to each first-scale voxel.

In some embodiments, the second prediction part 22 is further configured to perform feature extraction on the geometric information of the first-scale point cloud to determine the first-scale point cloud features; upsample the first-scale point cloud features to the second scale to determine the initial second-scale point cloud features, perform feature extraction on the initial second-scale point cloud features to determine the second-scale point cloud features, and perform occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel; perform local density prediction based on the first-scale point cloud features to determine the local density corresponding to the first-scale voxel.

In some embodiments, the second prediction part 22 is further configured to perform feature extraction on the geometric information of the first-scale point cloud through a first feature extraction network to determine the first point cloud features of the first scale; upsample the first point cloud features of the first scale to a second scale to determine the second-scale point cloud features, and perform occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel; perform feature extraction on the geometric data of the first-scale point cloud through a second feature extraction network to determine the second point cloud features of the first scale; perform local density prediction based on the second point cloud features of the first scale to determine the local density corresponding to the first-scale voxel.

In some embodiments, the second prediction part 22 is further configured to perform feature extraction on the geometric information of the first-scale point cloud to determine the first-scale point cloud features; upsample the first-scale point cloud features to the second scale to determine the second-scale point cloud features, and perform occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel; perform local density prediction based on the first-scale point cloud features to determine the local density corresponding to the first-scale voxel.

In some embodiments, the second prediction part 22 is further configured to use a local density prediction network to perform local density prediction based on the first-scale point cloud features to determine the local density corresponding to the first-scale voxel; wherein the local density prediction network includes: a first sparse convolution layer, a first activation function layer, a second sparse convolution layer and a second activation function layer.

In some embodiments, the encoding part 24 is further configured to perform recoloring processing based on the reconstructed geometric information corresponding to the second-scale point cloud to obtain colored point cloud data, perform color information encoding based on the colored point cloud data, and determine the attribute information encoding corresponding to the second-scale point cloud; perform geometric information encoding on the point cloud data of the second-scale point cloud to determine the geometric information encoding corresponding to the second-scale point cloud; and combine the geometric information encoding and the attribute information encoding to determine the encoding information corresponding to the second-scale point cloud.

It should be noted that the description of the above device embodiment is similar to the description of the above method embodiment, and has similar beneficial effects as the method embodiment. For technical details not disclosed in the device embodiment of the present application, please refer to the description of the method embodiment of the present application for understanding.

In some embodiments, the embodiment of the present application further provides a decoder, and FIG22 is an optional structural diagram of the decoder 3 provided in the embodiment of the present application. As shown in FIG22, the decoder 3 includes: a first memory 32 and a first processor 33. Among them, the first memory 32 and the first processor 33 are connected through a first communication bus 34; the first memory 32 is used to store executable instructions; the first processor 33 is used to execute the executable instructions stored in the first memory 32, and implement the decoding method provided in the embodiment of the present application.

In some embodiments, the embodiment of the present application further provides an encoder, and FIG23 is an optional structural diagram of the encoder 4 provided in the embodiment of the present application. As shown in FIG23, the encoder 4 includes: a second memory 42 and a second processor 43. Among them, the second memory 42 and the second processor 43 are connected via a second communication bus 44; the second memory 42 is used to store executable instructions; the second processor 43 is used to execute the executable instructions stored in the second memory 42, and implement the encoding method provided in the embodiment of the present application.

An embodiment of the present application provides a computer-readable storage medium storing executable instructions, wherein executable instructions are stored. When the executable instructions are executed by a first processor, the first processor will be caused to execute any one of the decoding methods provided in the embodiments of the present application; or, when the executable instructions are executed by a second processor, the second processor will be caused to execute any one of the encoding methods provided in the embodiments of the present application.

In some embodiments, the computer-readable storage medium may be a memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface storage, optical disk, or CD-ROM; or it may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be in the form of a program, software, software module, script or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine or other unit suitable for use in a computing environment.

As an example, executable instructions may, but do not necessarily, correspond to a file in a file system, may be stored as part of a file that stores other programs or data, such as, for example, in one or more scripts in a Hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files storing one or more modules, subroutines, or code portions).

By way of example, executable instructions may be deployed to be executed on one computing device, or on multiple computing devices located at one site, or on multiple computing devices distributed across multiple sites and interconnected by a communication network.

Those skilled in the art will appreciate that the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may adopt the form of hardware embodiments, software embodiments, or embodiments in combination with software and hardware. Moreover, the present application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) that contain computer-usable program code.

The present application is described with reference to the flowchart and/or block diagram of the method, device (system) and computer program product according to the embodiment of the present application. It should be understood that each flow process and/or box in the flow chart and/or block diagram and the combination of the flow process and/or box in the flow chart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processing machine or other programmable data processing device to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing device produce a device for realizing the function specified in one flow chart or multiple flows and/or one box or multiple boxes of the block chart.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

The above is only a preferred embodiment of the present application and is not intended to limit the protection scope of the present application. Any modifications, equivalent replacements and improvements made within the spirit and scope of the present application are included in the protection scope of the present application.

Industrial Applicability

The embodiment of the present application provides a coding and decoding method, a decoder, an encoder and a computer-readable storage medium. The decoder can determine the number of occupied second-scale voxels in the second-scale voxels obtained by sampling each first-scale voxel by predicting the local density. In this way, the occupancy probability corresponding to the second-scale voxel can be screened in combination with the local density to determine the occupancy of the second-scale voxel, reconstruct the second-scale point cloud according to the occupancy of the second-scale voxel, and determine the reconstructed geometric information of the second-scale point cloud. In this way, the occupancy of the determined second-scale voxel can be made more accurate, the accuracy of the reconstructed geometric information of the second-scale point cloud is improved, the reconstructed geometric quality of the decoder is improved, and the decoding performance is improved. In addition, in the encoder, based on the local density corresponding to the first-scale voxel, the occupancy probability corresponding to the second-scale voxel is screened, which can improve the accuracy of determining the occupancy of the second-scale voxel, thereby improving the accuracy of encoding the reconstructed geometric information of the second-scale point cloud determined based on the occupancy of the second-scale voxel, thereby improving the encoding performance.

Claims

A decoding method, comprising:

Parse the bitstream to determine the encoding information corresponding to the second-scale point cloud, and determine the first-scale point cloud; the first-scale point cloud is the previously decoded point cloud data corresponding to the second-scale point cloud;

Based on the first-scale point cloud, a local density prediction is performed to determine a local density corresponding to a first-scale voxel in the first-scale point cloud, and an occupation probability prediction is performed on a second-scale voxel to determine an occupation probability corresponding to the second-scale voxel; the second-scale voxel is an upsampled voxel corresponding to the first-scale voxel; the local density represents the number of occupied second-scale voxels in the second-scale voxel corresponding to the first-scale voxel;

Based on the occupancy probability corresponding to the second-scale voxel and the local density corresponding to the first-scale voxel, the encoded information corresponding to the second-scale point cloud is decoded and reconstructed to determine the reconstructed geometric information corresponding to the second-scale point cloud.
The method according to claim 1, wherein the decoding and reconstructing the encoded information corresponding to the second-scale point cloud based on the occupancy probability corresponding to the second-scale voxel and the local density corresponding to the first-scale voxel to determine the reconstructed geometric information corresponding to the second-scale point cloud comprises:

For each first-scale voxel in the first-scale point cloud, determine the second-scale voxels with high local density of occupation probability among the multiple second-scale voxels corresponding to each first-scale voxel as occupied second-scale voxels;

Based on the occupied second-scale voxels corresponding to each first-scale voxel, the encoded information corresponding to the second-scale point cloud is decoded and reconstructed to determine the reconstructed geometric information corresponding to the second-scale point cloud.
The method according to claim 1, wherein the performing local density prediction based on the first-scale point cloud to determine the local density corresponding to the first-scale voxel in the first-scale point cloud, and performing occupancy probability prediction on the second-scale voxel to determine the occupancy probability corresponding to the second-scale voxel comprises:

Extracting features from the geometric information of the first-scale point cloud to determine features of the first-scale point cloud;

Upsampling the first-scale point cloud features to a second scale to determine initial second-scale point cloud features, performing feature extraction on the initial second-scale point cloud features to determine second-scale point cloud features, and performing occupancy probability prediction based on the second-scale point cloud features to determine occupancy probabilities corresponding to voxels at the second scale;

A local density prediction is performed according to the first-scale point cloud features to determine a local density corresponding to the first-scale voxel.
The method according to claim 1, wherein the performing local density prediction based on the first-scale point cloud to determine the local density corresponding to the first-scale voxel in the first-scale point cloud, and performing occupancy probability prediction on the second-scale voxel to determine the occupancy probability corresponding to the second-scale voxel comprises:

Extracting features from the geometric information of the first-scale point cloud through a first feature extraction network to determine first point cloud features at a first scale;

Upsampling the first point cloud features at the first scale to a second scale, determining the second-scale point cloud features, and performing occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel;

Performing feature extraction on the geometric data of the first-scale point cloud through a second feature extraction network to determine features of a second point cloud at the first scale;

A local density prediction is performed according to the second point cloud feature of the first scale to determine a local density corresponding to the voxel of the first scale.
The method according to claim 1, wherein the performing local density prediction based on the first-scale point cloud to determine the local density corresponding to the first-scale voxel in the first-scale point cloud, and performing occupancy probability prediction on the second-scale voxel to determine the occupancy probability corresponding to the second-scale voxel comprises:

Extracting features from the geometric information of the first-scale point cloud to determine features of the first-scale point cloud;

Upsampling the first-scale point cloud features to a second scale, determining second-scale point cloud features, and performing occupancy probability prediction based on the second-scale point cloud features to determine occupancy probabilities corresponding to voxels at the second scale;

A local density prediction is performed according to the first-scale point cloud features to determine a local density corresponding to the first-scale voxel.
The method according to any one of claims 3 to 5, wherein the method further comprises:

A local density prediction network is used to perform local density prediction according to the first-scale point cloud features to determine the local density corresponding to the first-scale voxel.
The method according to claim 6, wherein the local density prediction network comprises:

The first sparse convolution layer, the first activation function layer, the second sparse convolution layer and the second activation function layer.
The method according to any one of claims 1 to 5 or claim 7, wherein the method further comprises:

Parse the bitstream to determine the encoding information corresponding to the i-th scale point cloud; i is an integer greater than or equal to 3;

Based on the reconstructed geometric information of the i-1-th scale point cloud, the local density of the i-1-th scale voxel is predicted to determine the local density corresponding to the i-1-th scale voxel, and the occupancy probability of the i-th scale voxel corresponding to the i-1-th scale voxel is predicted to determine the occupancy probability corresponding to the i-th scale voxel; the i-th scale is obtained by upsampling the i-1-th scale;

Based on the occupancy probability corresponding to the i-th scale voxel and the local density corresponding to the i-1-th scale voxel, the encoded information corresponding to the i-th scale point cloud is decoded and reconstructed to determine the reconstructed geometric information corresponding to the i-th scale point cloud.
The method according to any one of claims 1 to 5 or claim 7, wherein the method further comprises:

Based on the reconstructed geometric data corresponding to the n-th scale point cloud, the local density of the n-th scale voxel and the occupation probability of the n+1-th scale voxel are predicted, and the local density corresponding to the n-th scale voxel and the occupation probability corresponding to the n+1-th scale voxel are determined; n is a positive integer greater than or equal to 2; the n+1-th scale is obtained by sampling the n-th scale;

Based on the local density corresponding to the n-th scale voxel and the occupancy probability corresponding to the n+1-th scale voxel, the reconstructed geometric data corresponding to the n+1-th scale point cloud is determined.
A coding method, comprising:

Downsampling the second-scale point cloud to determine the first-scale point cloud, and upsampling the first-scale voxels in the first-scale point cloud to the second scale to determine the second-scale voxels corresponding to the first-scale voxels;

Performing local density prediction based on the first-scale point cloud to determine the local density corresponding to the first-scale voxel, and performing occupation probability prediction on the second-scale voxel to determine the occupation probability corresponding to the second-scale voxel; the local density represents the number of occupied second-scale voxels in the second-scale voxels corresponding to the first-scale voxels;

Determining reconstructed geometric information corresponding to the second-scale point cloud according to the local density corresponding to the first-scale voxel and the occupancy probability corresponding to the second-scale voxel;

Encoding is performed based on the reconstructed geometric information corresponding to the second-scale point cloud, encoding information corresponding to the second-scale point cloud is determined, and the encoding information corresponding to the second-scale point cloud is written into a bitstream.
The method according to claim 10, wherein determining the reconstructed geometric information corresponding to the second-scale point cloud according to the local density corresponding to the first-scale voxel and the occupancy probability corresponding to the second-scale voxel comprises:

For each first-scale voxel in the first-scale point cloud, determine the second-scale voxels with high local density of occupation probability among the multiple second-scale voxels corresponding to each first-scale voxel as occupied second-scale voxels;

Based on the occupied second-scale voxels corresponding to each first-scale voxel, reconstructed geometric information corresponding to the second-scale point cloud is determined.
The method according to claim 10, wherein the performing local density prediction based on the first-scale point cloud to determine the local density corresponding to the first-scale voxel, and performing occupancy probability prediction on the second-scale voxel to determine the occupancy probability corresponding to the second-scale voxel comprises:

Extracting features from the geometric information of the first-scale point cloud to determine features of the first-scale point cloud;

Upsampling the first-scale point cloud features to a second scale to determine initial second-scale point cloud features, performing feature extraction on the initial second-scale point cloud features to determine second-scale point cloud features, and performing occupancy probability prediction based on the second-scale point cloud features to determine occupancy probabilities corresponding to voxels at the second scale;

A local density prediction is performed according to the first-scale point cloud features to determine a local density corresponding to the first-scale voxel.
The method according to claim 10, wherein the performing local density prediction based on the first-scale point cloud to determine the local density corresponding to the first-scale voxel, and performing occupancy probability prediction on the second-scale voxel to determine the occupancy probability corresponding to the second-scale voxel comprises:

Extracting features from the geometric information of the first-scale point cloud through a first feature extraction network to determine first point cloud features at a first scale;

Upsampling the first point cloud features at the first scale to a second scale, determining the second-scale point cloud features, and performing occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel;

Performing feature extraction on the geometric data of the first-scale point cloud through a second feature extraction network to determine features of a second point cloud at the first scale;

A local density prediction is performed according to the second point cloud feature of the first scale to determine a local density corresponding to the voxel of the first scale.
The method according to claim 10, wherein the performing local density prediction based on the first-scale point cloud to determine the local density corresponding to the first-scale voxel, and performing occupancy probability prediction on the second-scale voxel to determine the occupancy probability corresponding to the second-scale voxel comprises:

Extracting features from the geometric information of the first-scale point cloud to determine features of the first-scale point cloud;

Upsampling the first-scale point cloud features to a second scale, determining second-scale point cloud features, and performing occupancy probability prediction based on the second-scale point cloud features to determine occupancy probabilities corresponding to voxels at the second scale;

A local density prediction is performed according to the first-scale point cloud features to determine a local density corresponding to the first-scale voxel.
The method according to any one of claims 12 to 14, wherein the method further comprises:

Using a local density prediction network, performing local density prediction according to the first-scale point cloud features, and determining a local density corresponding to the first-scale voxel;

Wherein, the local density prediction network comprises:

The first sparse convolution layer, the first activation function layer, the second sparse convolution layer and the second activation function layer.
The method according to any one of claims 10 to 14, wherein encoding the reconstructed geometric information corresponding to the second-scale point cloud, determining the encoding information corresponding to the second-scale point cloud, and writing the encoding information into the bitstream comprises:

Recoloring is performed based on the reconstructed geometric information corresponding to the second-scale point cloud to obtain colored point cloud data, and color information is encoded based on the colored point cloud data to determine the attribute information encoding corresponding to the second-scale point cloud;

Performing geometric information encoding on the point cloud data of the second-scale point cloud to determine the geometric information encoding corresponding to the second-scale point cloud;

The geometric information encoding and the attribute information encoding are determined as encoding information corresponding to the second-scale point cloud.
A decoder, comprising:

A parsing part, configured to parse the bitstream and determine the encoding information corresponding to the second scale point cloud;

The determining part is configured to determine a first-scale point cloud; the first-scale point cloud is the previously decoded point cloud data corresponding to the second-scale point cloud;

The prediction part is configured to perform local density prediction based on the first-scale point cloud, determine the local density corresponding to the first-scale voxel in the first-scale point cloud, and perform occupation probability prediction on the second-scale voxel to determine the occupation probability corresponding to the second-scale voxel; the second-scale voxel is an upsampled voxel corresponding to the first-scale voxel; the local density represents the number of occupied second-scale voxels in the second-scale voxel corresponding to the first-scale voxel;

The decoding and reconstruction part is configured to decode and reconstruct the encoded information corresponding to the second-scale point cloud based on the occupancy probability corresponding to the second-scale voxel and the local density corresponding to the first-scale voxel, and determine the reconstructed geometric information corresponding to the second-scale point cloud.
An encoder, comprising:

A downsampling part, configured to perform voxel downsampling on the second scale point cloud to determine a first scale point cloud;

A local density prediction part, configured to perform local density prediction based on the first-scale point cloud to determine the local density corresponding to the first-scale voxel;

The occupancy probability prediction part is configured to upsample the first scale voxels in the first scale point cloud to the second scale, determine the second scale voxels corresponding to the first scale voxels; and perform occupancy probability prediction on the second scale voxels to determine the occupancy probability corresponding to the second scale voxels;

A reconstruction part, configured to determine the reconstructed geometric information corresponding to the second-scale point cloud according to the local density corresponding to the first-scale voxel and the occupancy probability corresponding to the second-scale voxel;

The encoding part is configured to perform encoding based on the reconstructed geometric information corresponding to the second-scale point cloud, determine the encoding information corresponding to the second-scale point cloud, and write the encoding information into a bitstream.
A code stream, including:

The code stream is generated by bit encoding according to the coding information; wherein the coding information at least includes: coding information corresponding to the second-scale point cloud.
A decoder, comprising:

A first memory configured to store executable instructions;

The first processor is configured to implement the method according to any one of claims 1 to 9 when executing the executable instructions stored in the first memory.
An encoder, comprising:

a second memory configured to store executable instructions;

The second processor is configured to implement the method according to any one of claims 10 to 16 when executing the executable instructions stored in the second memory.
A computer-readable storage medium storing executable instructions for causing a first processor to execute the method described in any one of claims 1 to 9, or for causing a second processor to execute the method described in any one of claims 10 to 16.