WO2024082105A1 - Procédé de codage, procédé de décodage, décodeur, codeur et support de stockage lisible par ordinateur - Google Patents

Procédé de codage, procédé de décodage, décodeur, codeur et support de stockage lisible par ordinateur Download PDF

Info

Publication number
WO2024082105A1
WO2024082105A1 PCT/CN2022/125742 CN2022125742W WO2024082105A1 WO 2024082105 A1 WO2024082105 A1 WO 2024082105A1 CN 2022125742 W CN2022125742 W CN 2022125742W WO 2024082105 A1 WO2024082105 A1 WO 2024082105A1
Authority
WO
WIPO (PCT)
Prior art keywords
scale
point cloud
voxel
local density
voxels
Prior art date
Application number
PCT/CN2022/125742
Other languages
English (en)
Chinese (zh)
Inventor
马展
薛瑞翔
魏红莲
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to PCT/CN2022/125742 priority Critical patent/WO2024082105A1/fr
Publication of WO2024082105A1 publication Critical patent/WO2024082105A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals

Definitions

  • the present application relates to point cloud compression coding and decoding technology, and in particular to a coding and decoding method, a decoder, an encoder and a computer-readable storage medium.
  • Point cloud is a collection of points that can store the geometric position and related attribute information of each point, so as to accurately describe objects in space.
  • Point cloud data is huge, and a frame of point cloud can contain millions of points, which also brings great difficulties and challenges to the effective storage and transmission of point clouds. Therefore, compression technology is used to reduce redundant information in point cloud storage, so as to facilitate subsequent processing.
  • representative point cloud compression algorithms include: Video-based Point Cloud Compression (V-PCC) and Geometry-based Point Cloud Compression (G-PCC).
  • V-PCC Video-based Point Cloud Compression
  • G-PCC Geometry-based Point Cloud Compression
  • the geometric compression in G-PCC is mainly implemented through octree models and/or triangular surface models.
  • V-PCC is mainly implemented through three-dimensional to two-dimensional projection and video compression.
  • the structural information of the real environment expressed by the point cloud is restored by reconstructing the geometric information of the point cloud.
  • the process of reconstructing the geometric information of the point cloud includes: using a sparse convolutional neural network, taking the voxelized geometric information of the low-scale point cloud as input, and predicting the occupancy probability of each high-scale voxel in the high-scale point cloud.
  • the occupancy symbol of each high-scale voxel is determined, and the geometric information of the high-scale point cloud is reconstructed according to the occupancy symbol representing the occupied high-scale voxels.
  • the method of geometric reconstruction based on occupancy probability prediction is prone to inaccurate occupancy symbol determination when the occupancy probabilities of multiple high-scale voxels are close or the occupancy probability threshold is set unreasonably, thereby reducing the quality of geometric reconstruction and further reducing the encoding and decoding performance.
  • the embodiments of the present application provide a coding and decoding method, a decoder, an encoder and a computer-readable storage medium, which can improve the quality of geometric reconstruction of point cloud coding and decoding, thereby improving coding and decoding performance.
  • the present application provides a decoding method, including:
  • the first-scale point cloud is the previously decoded point cloud data corresponding to the second-scale point cloud;
  • a local density prediction is performed to determine a local density corresponding to a first-scale voxel in the first-scale point cloud, and an occupation probability prediction is performed on a second-scale voxel to determine an occupation probability corresponding to the second-scale voxel;
  • the second-scale voxel is an upsampled voxel corresponding to the first-scale voxel;
  • the local density represents the number of occupied second-scale voxels in the second-scale voxel corresponding to the first-scale voxel;
  • the encoded information corresponding to the second-scale point cloud is decoded and reconstructed to determine the reconstructed geometric information corresponding to the second-scale point cloud.
  • the present application provides an encoding method, including:
  • the local density represents the number of occupied second-scale voxels in the second-scale voxels corresponding to the first-scale voxels;
  • Encoding is performed based on the reconstructed geometric information corresponding to the second-scale point cloud, encoding information corresponding to the second-scale point cloud is determined, and the encoding information is written into a bitstream.
  • the present application provides a decoder, including:
  • a parsing part configured to parse the bitstream and determine the encoding information corresponding to the second scale point cloud
  • the determining part is configured to determine a first-scale point cloud; the first-scale point cloud is the previously decoded point cloud data corresponding to the second-scale point cloud;
  • a local density prediction part is configured to perform local density prediction based on the first-scale point cloud to determine the local density corresponding to the first-scale voxel in the first-scale point cloud;
  • the occupancy probability prediction part is configured to perform occupancy probability prediction on a second-scale voxel based on the first-scale point cloud, and determine the occupancy probability corresponding to the second-scale voxel;
  • the second-scale voxel is an upsampled voxel corresponding to the first-scale voxel;
  • the decoding and reconstruction part is configured to decode and reconstruct the encoded information corresponding to the second-scale point cloud based on the occupancy probability corresponding to the second-scale voxel and the local density corresponding to the first-scale voxel, and determine the reconstructed geometric information corresponding to the second-scale point cloud.
  • the present application provides an encoder, including:
  • a downsampling part configured to perform voxel downsampling on the second scale point cloud to determine a first scale point cloud
  • a local density prediction part configured to perform local density prediction based on the first-scale point cloud to determine the local density corresponding to the first-scale voxel
  • the occupancy probability prediction part is configured to upsample the first scale voxels in the first scale point cloud to the second scale, determine the second scale voxels corresponding to the first scale voxels; and perform occupancy probability prediction on the second scale voxels to determine the occupancy probability corresponding to the second scale voxels;
  • a reconstruction part configured to determine the reconstructed geometric information corresponding to the second-scale point cloud according to the local density corresponding to the first-scale voxel and the occupancy probability corresponding to the second-scale voxel;
  • the encoding part is configured to perform encoding based on the reconstructed geometric information corresponding to the second-scale point cloud, determine the encoding information corresponding to the second-scale point cloud, and write the encoding information into a bitstream.
  • the embodiment of the present application provides a code stream, including:
  • the code stream is generated by bit encoding according to the coding information; wherein the coding information at least includes: coding information corresponding to the second-scale point cloud.
  • the present application provides a decoder, including:
  • a first memory configured to store executable instructions
  • the first processor is configured to implement any of the decoding methods described above when executing the executable instructions stored in the first memory.
  • the present application provides an encoder, including:
  • a second memory configured to store executable instructions
  • the second processor is configured to implement any of the encoding methods described above when executing the executable instructions stored in the second memory.
  • An embodiment of the present application provides a computer-readable storage medium storing executable instructions for causing a first processor to execute to implement the above-mentioned decoding method, or for causing a second processor to execute to implement the above-mentioned encoding method.
  • An embodiment of the present application provides a computer program product, including a computer program or instructions.
  • the decoding method provided by the embodiment of the present application is implemented; or, when the computer program or instructions are executed by a second processor, the encoding method provided by the embodiment of the present application is implemented.
  • the embodiment of the present application provides a coding and decoding method, a decoder, an encoder and a computer-readable storage medium.
  • the decoder can determine the number of occupied second-scale voxels in the second-scale voxels obtained by sampling each first-scale voxel by predicting the local density. In this way, the occupancy probability corresponding to the second-scale voxel can be screened in combination with the local density to determine the occupancy of the second-scale voxel, reconstruct the second-scale point cloud according to the occupancy of the second-scale voxel, and determine the reconstructed geometric information of the second-scale point cloud.
  • the occupancy of the determined second-scale voxel can be made more accurate, and the accuracy of the reconstructed geometric information of the second-scale point cloud can be improved, that is, the reconstructed geometric quality of the decoder is improved, and the decoding performance is improved.
  • the occupancy probability corresponding to the second-scale voxel is screened, which can improve the accuracy of determining the occupancy of the second-scale voxel, and then improve the accuracy of encoding the reconstructed geometric information of the second-scale point cloud determined based on the occupancy of the second-scale voxel, thereby improving the encoding performance.
  • FIG1 is a flow chart of G-PCC coding
  • FIG2 is a flow chart of G-PCC decoding
  • FIG3 is a schematic diagram of an optional flow chart of a decoding method provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of an optional process of voxel upsampling provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of an optional flow chart of a decoding method provided in an embodiment of the present application.
  • FIG6 is a schematic diagram of an optional flow chart of an occupancy probability prediction and local density prediction process provided in an embodiment of the present application.
  • FIG7 is a schematic diagram of an optional flow chart of an occupancy probability prediction and local density prediction process provided in an embodiment of the present application.
  • FIG8 is a schematic diagram of an optional flow chart of an occupancy probability prediction and local density prediction process provided in an embodiment of the present application.
  • FIG9 is a schematic diagram of an optional structure of a local density prediction network provided in an embodiment of the present application.
  • FIG10 is a schematic diagram of an optional structure of a feature extraction network provided in an embodiment of the present application.
  • FIG11 is a schematic diagram of an optional structure of a residual layer in a feature extraction network provided in an embodiment of the present application.
  • FIG12 is a schematic diagram of an optional structure of an occupancy probability prediction network provided in an embodiment of the present application.
  • FIG13 is a schematic diagram of an occupancy probability corresponding to a second-scale voxel provided in an embodiment of the present application.
  • FIG. 14 is a schematic diagram of an optional flow chart of a decoding method provided in an embodiment of the present application
  • FIG15 is a schematic diagram of an optional process for reconstructing point cloud geometric information based on occupancy probability and local density provided in an embodiment of the present application;
  • FIG16 is a schematic diagram of an optional flow chart of an encoding method provided in an embodiment of the present application.
  • FIG17 is a schematic diagram of an optional process of voxel downsampling provided in an embodiment of the present application.
  • FIG18 is a schematic diagram of an occupancy symbol obtained by voxel downsampling according to an embodiment of the present application.
  • FIG19 is a schematic diagram of an optional process of applying the decoding method provided in an embodiment of the present application to an actual scenario
  • FIG20 is a schematic diagram of an optional structure of a decoder provided in an embodiment of the present application.
  • FIG21 is a schematic diagram of an optional structure of an encoder provided in an embodiment of the present application.
  • FIG22 is a schematic diagram of an optional structure of a decoder provided in an embodiment of the present application.
  • FIG. 23 is a schematic diagram of an optional structure of an encoder provided in an embodiment of the present application.
  • first ⁇ second ⁇ third involved are merely used to distinguish similar objects and do not represent a specific ordering of the objects. It can be understood that “first ⁇ second ⁇ third” can be interchanged with a specific order or sequence where permitted, so that the embodiments of the present application described here can be implemented in an order other than that illustrated or described here.
  • Voxel is the abbreviation of volume element, which is the smallest unit of digital data in three-dimensional space segmentation. Voxel can be used to divide 3D space into grids and give each grid feature. For example, voxel can be a cubic block of fixed size in three-dimensional space. Voxel can be widely used in fields such as three-dimensional imaging, scientific data and medical imaging.
  • Point cloud compression algorithms include: Video-based Point Cloud Compression (V-PCC) and Geometry-based Point Cloud Compression (G-PCC).
  • V-PCC Video-based Point Cloud Compression
  • G-PCC Geometry-based Point Cloud Compression
  • the geometry compression in G-PCC is mainly implemented through the octree model and/or the triangle surface model.
  • V-PCC is mainly implemented through 3D to 2D projection and video compression.
  • neural networks are applied to geometry-based point cloud compression technology.
  • the point cloud geometry compression technology based on neural networks can be roughly divided into geometric lossy compression and lossless compression.
  • the lossless compression algorithm mainly revolves around the design of the prediction model of voxel occupancy probability.
  • the data representation of voxels usually uses octree models, volume models, sparse tensor representations, etc.
  • the encoder side it is often necessary to use the surrounding context such as parent nodes and neighbor nodes as input, and after processing by the neural network (such as convolution, full connection) layer, output the occupancy probability of each voxel in the geometric data of the point cloud, and then use the entropy encoder to convert the voxel occupancy symbol corresponding to the occupancy probability of each voxel into a bitstream.
  • the occupancy probability of each voxel is predicted according to the same process, and the voxel occupancy symbol is decoded from the bitstream based on the predicted occupancy probability to reconstruct the geometric data of the point cloud.
  • the occupation symbol of a voxel is determined only based on the occupation probability, it is easy to have inaccurate occupation symbol determination when the occupation probabilities of multiple adjacent voxels are close or the occupation probability threshold is set unreasonably. Especially in the case of uneven point cloud density distribution, it is easy for the occupation symbol to represent the number of occupied voxels to be inconsistent with the actual number of occupied voxels, resulting in more or less points in the reconstructed geometric information. This reduces the quality of geometric reconstruction and further reduces the encoding and decoding performance.
  • the embodiments of the present application provide a coding and decoding method, a decoder, an encoder and a computer-readable storage medium, which can improve coding and decoding efficiency and improve coding and decoding performance.
  • a flow chart of G-PCC encoding and a flow chart of G-PCC decoding are first provided. It should be noted that the flow chart of G-PCC encoding and the flow chart of G-PCC decoding described in the embodiments of the present application are only for more clearly illustrating the technical solution of the embodiments of the present application, and do not constitute a limitation on the technical solution provided by the embodiments of the present application.
  • the point cloud compressed in the embodiments of the present application can be a point cloud in a video, but is not limited to this.
  • the point cloud of the input 3D image model is sliced and each slice is encoded independently.
  • the point cloud data is first divided into multiple slices by strip division. In each slice, the geometric information and attribute information of the point cloud are encoded separately. In the geometric coding process, the geometric information is transformed so that all the point clouds are contained in a bounding box, and then quantized. Quantization mainly plays a role in scaling. Due to the quantization rounding, the geometric information of a part of the point cloud is the same. It can be determined whether to remove duplicate points based on parameters. The process of quantization and removal of duplicate points is also called voxelization. Then the bounding box is divided into octrees.
  • the bounding box is divided into 8 sub-cubes, and the non-empty (containing points in the point cloud) sub-cubes are divided into 8 equal parts until the leaf node obtained by the division is a 1x1x1 unit cube.
  • the division is stopped, and the points in the leaf node are arithmetically encoded to generate a binary geometric bit stream, that is, a geometric code stream.
  • octree division must also be performed first.
  • the trisoup does not need to divide the point cloud into unit cubes with a side length of 1x1x1 step by step. Instead, the division stops when the side length of the sub-block is W.
  • the surface and the twelve edges of the block are obtained. At most twelve intersections (vertex) generated by the twelve edges of the block are obtained, and the vertices are arithmetically encoded (surface fitting based on the intersections) to generate a binary geometric bit stream, that is, a geometric code stream. Vertex is also used to implement the process of geometric reconstruction, and the reconstructed geometric information is used when encoding the attributes of the point cloud.
  • color conversion is performed to convert the color information (i.e., attribute information) from the RGB color space to the YUV color space. Then, the point cloud is recolored using the reconstructed geometric information so that the unencoded attribute information corresponds to the reconstructed geometric information.
  • LOD level of detail
  • RAHT region adaptive hierarchical transformation
  • the geometric encoding data after octree division and surface fitting and the attribute encoding data processed by the quantized coefficients are sliced and synthesized, and the vertex coordinates of each block are encoded in turn (i.e., arithmetic encoding) to generate a binary attribute bit stream, i.e., the attribute code stream.
  • the flowchart of G-PCC decoding shown in Figure 2 is applied to the decoder.
  • the decoder obtains a binary code stream, and independently decodes the geometric bit stream (i.e., geometric code stream) and attribute bit stream in the binary code stream.
  • the geometric bit stream the geometric information of the point cloud is obtained through arithmetic decoding-octree synthesis-surface fitting-reconstruction of geometry-inverse coordinate transformation;
  • the attribute information of the point cloud is obtained through arithmetic decoding-inverse quantization-LOD-based inverse lifting or RAHT-based inverse transformation-inverse color conversion, and the three-dimensional image model of the point cloud data to be encoded is restored based on the geometric information and attribute information.
  • the encoding method of the embodiment of the present application can be applied to the geometric information encoding process of the G-PCC as shown in Figure 1.
  • the geometric encoding process in Figure 1 is performed based on the voxelized second-scale point cloud to obtain a geometric bit stream;
  • the voxel-down sampling is performed on the voxelized second-scale point cloud to determine the first-scale point cloud, and the first-scale voxels in the first-scale point cloud are upsampled to the second scale to determine the second-scale voxels corresponding to the first-scale voxels;
  • local density prediction is performed based on the first-scale point cloud to determine the local density corresponding to the first-scale voxels, and the occupation probability prediction is performed on the second-scale voxels to determine the occupation probability corresponding to the second-scale voxels;
  • the local density represents the number of occupied second-scale voxels in the second-scale
  • the decoding method of the embodiment of the present application can be applied to the geometric information decoding process of G-PCC as shown in FIG2.
  • the coding information corresponding to the second-scale point cloud is determined, and the first-scale point cloud is determined; the first-scale point cloud is the previously decoded point cloud data corresponding to the second-scale point cloud.
  • the coding information corresponding to the second-scale point cloud is subjected to arithmetic decoding, octree synthesis, and surface fitting.
  • local density prediction is performed based on the first-scale point cloud to determine the local density corresponding to the first-scale voxel in the first-scale point cloud
  • occupation probability prediction is performed on the second-scale voxel to determine the occupation probability corresponding to the second-scale voxel
  • the second-scale voxel is the upsampled voxel corresponding to the first-scale voxel
  • the local density represents the number of occupied second-scale voxels in the second-scale voxel corresponding to the first-scale voxel
  • the coding information corresponding to the second-scale point cloud is decoded and reconstructed to determine the reconstructed geometric information corresponding to the second-scale point cloud.
  • the reconstructed geometric information corresponding to the second-scale point cloud is used to perform LOD-based inverse lifting or RAHT-based inverse transformation-inverse color conversion to obtain the attribute information of the second-scale point cloud, and the three-dimensional image model of the second-scale point cloud is restored based on the reconstructed geometric information and attribute information.
  • the encoding method and decoding method of the embodiments of the present application can also be used in other point cloud encoding and decoding processes besides G-PCC.
  • Figure 3 is an optional flowchart of a decoding method provided in an embodiment of the present application, which will be explained in conjunction with the steps shown in Figure 3.
  • the decoder parses the received code stream to obtain the encoding information corresponding to the second scale point cloud and determines the first scale point cloud, wherein the first scale point cloud is the previously decoded point cloud data corresponding to the second scale point cloud.
  • the code stream generally includes coding information corresponding to at least one scale of point cloud sent by the encoder.
  • the decoder decodes in order from low scale to high scale. That is, before the decoder decodes the coding information corresponding to the second scale point cloud, it has completed the decoding of the coding information of the previous scale point cloud of the second scale point cloud, that is, the first scale point cloud, and determined the decoded first scale point cloud data, that is, the first scale point cloud.
  • the decoder can use the decoded low-scale first-scale point cloud data to decode the coding information corresponding to the high-scale second-scale point cloud and reconstruct the geometric information.
  • S102 performing local density prediction based on the first-scale point cloud to determine the local density corresponding to the first-scale voxels in the first-scale point cloud, and performing occupancy probability prediction on the second-scale voxels to determine the occupancy probability corresponding to the second-scale voxels.
  • the geometric information of the first-scale point cloud and the second-scale point cloud has undergone a voxelization process of the encoder and can be represented in the form of a voxel grid.
  • a point in the point cloud may correspond to an occupied voxel (i.e., a non-empty voxel), and an unoccupied voxel (i.e., an empty voxel) indicates that there is no point in the point cloud at the voxel position.
  • the occupied voxels may be marked as 1, and the unoccupied voxels may be marked as 0.
  • the voxelized point cloud may represent the geometric data of the point cloud by the occupation symbols of the voxels at each position in the voxel grid.
  • the scale of the point cloud corresponds to the scale of the voxels in the point cloud. That is, the voxels contained in the first-scale point cloud are first-scale voxels, and the voxels contained in the second-scale point cloud are second-scale voxels. Among them, the second-scale voxels are upsampled voxels corresponding to the first-scale voxels.
  • the decoder can perform voxel upsampling on the first-scale voxels in the first-scale point cloud. For example, the first-scale voxels representing the occupied points in the first-scale point cloud are upsampled to obtain multiple second-scale voxels corresponding to each first-scale voxel.
  • the decoder can implement voxel upsampling by pooling, such as using a maximum pooling layer with a step size of 2 ⁇ 2 ⁇ 2 to divide one first-scale voxel in the first-scale point cloud into eight second-scale voxels.
  • Each upsampling evenly divides the size of the first-scale voxel in three dimensions, that is, the size of the second-scale voxel in three dimensions is half of the first-scale voxel.
  • each first-scale voxel in the first-scale point cloud completes the upsampling from the low-scale point cloud with known geometric information to the high-scale point cloud, and obtains the second-scale point cloud whose geometric information is to be reconstructed.
  • Figure 4 shows a first-scale point cloud containing 2 ⁇ 2 ⁇ 1 first-scale voxels. After one voxel upsampling, a second-scale point cloud containing 4 ⁇ 4 ⁇ 2 second-scale voxels is obtained.
  • the first-scale point cloud is the decoded point cloud data.
  • the occupied first-scale voxels are represented by solid cubes, which represent the locations of the points in the point cloud.
  • the second-scale voxels are upsampled to second-scale voxels, whether the second-scale voxels are occupied needs to be further determined through a geometric reconstruction process, which is represented by empty cubes in Figure 4.
  • the point cloud in Figure 4 is only exemplary, and the actual point cloud may include more voxels.
  • the local density represents the number of occupied second-scale voxels in the second-scale voxels corresponding to the first-scale voxels. Therefore, the decoder performs local density prediction based on the first-scale point cloud, determines the local density corresponding to the first-scale voxels in the first-scale point cloud, and can determine the number of occupied second-scale voxels in the multiple second-scale voxels corresponding to each first-scale voxel. In this way, the occupied second-scale voxels in the multiple second-scale voxels can be more accurately determined by combining the multiple occupation probabilities corresponding to the multiple second-scale voxels.
  • the decoder predicts the occupancy probability of the second-scale voxels obtained by upsampling the first-scale voxels, and determines multiple occupancy probabilities corresponding to multiple second-scale voxels corresponding to the first-scale voxels.
  • S102 may be implemented by executing the process of S201 - S203 as follows:
  • S201 extracting features from geometric information of a first-scale point cloud to determine features of the first-scale point cloud.
  • the geometric information of the first scale point cloud may include the occupancy of each voxel in the first scale point cloud and the position information of each voxel.
  • the occupancy may be a preset flag such as 0 or 1
  • the position information may be the three-dimensional coordinates of the voxel.
  • the geometric information of the first scale point cloud may be obtained by decoding and reconstructing the encoded information of the first scale point cloud, that is, the geometric information of the first scale point cloud may be equivalent to the reconstructed geometric information of the first scale point cloud.
  • the decoder extracts features from the geometric information of the first-scale point cloud, maps the geometric information of the first-scale point cloud to a preset low-scale feature space, and determines the features of the first-scale point cloud.
  • the first-scale point cloud features may include implicit features extracted from the geometric information of the first-scale point cloud.
  • the decoder upsamples the first-scale point cloud features to the second scale to determine the initial second-scale point cloud features.
  • the decoder extracts features from the initial second-scale point cloud features to determine the second-scale point cloud features.
  • the decoder predicts the occupancy probability of the second-scale voxels based on the second-scale point cloud features, predicts the probability that the voxel position corresponding to each second-scale voxel falls into the midpoint of the point cloud, and determines the occupancy probability corresponding to the second-scale voxel.
  • the second-scale point cloud features are obtained by performing two feature extractions on the geometric information of the first-scale point cloud. Therefore, the second-scale point cloud features have better feature expression, which can improve the accuracy of the decoder's prediction of occupancy probability using the second-scale point cloud features.
  • the decoder performs local density prediction based on the first-scale point cloud features, and predicts the number of occupied second-scale voxels in multiple second-scale voxels corresponding to each first-scale voxel in the first-scale point cloud as the local density corresponding to each first-scale voxel.
  • the local density is a numerical value, which can be an integer value greater than or equal to 1 and less than or equal to the total number of second-scale voxels corresponding to a first-scale voxel.
  • the execution order of the feature upsampling, feature extraction of the initial second-scale point cloud features, and occupancy probability prediction processes in S202 and the local density prediction process in S203 is not limited to that shown in FIG. 5 . In practical applications, they can also be executed in any order or simultaneously, depending on the actual situation, and the embodiment of the present application does not limit this.
  • the decoder extracts features of the geometric information of the first-scale point cloud through the third feature extraction network to determine the first-scale point cloud features; the first-scale point cloud features are respectively input into the first branch including the upsampling network, the fourth feature extraction network and the occupancy probability prediction network, and the second branch including the local density prediction network; the occupancy probability corresponding to each second-scale voxel is output through the first branch, and the local density corresponding to each first-scale voxel is output through the second branch.
  • the upsampling network can be implemented by a transposed convolutional network; the third feature extraction network is used to extract features of the geometric information of the first-scale point cloud, and the fourth feature extraction network is used to extract features of the initial second-scale point cloud features; the occupancy probability prediction network and the local density prediction network can be implemented using pre-trained neural networks.
  • S102 in FIG. 3 may also be implemented by executing the process of S301-S304 as follows:
  • S302 up-sample the first point cloud features at the first scale to the second scale, determine the second-scale point cloud features, and perform occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxels.
  • the decoder can also extract different first-scale point cloud features, that is, first point cloud features and second point cloud features, for occupancy probability prediction and local density prediction processes through different feature extraction networks, which are used for occupancy probability prediction and local density prediction, respectively.
  • the first feature extraction network can be obtained by jointly learning or training with a neural network for occupancy probability prediction
  • the second feature extraction network can be obtained by jointly learning or training with a local density prediction network.
  • the embodiment of the present application does not limit the execution order of the S301 - S302 process and the S303 - S304 process.
  • the first branch in FIG7 includes a first feature extraction network, an upsampling network, and an occupancy probability prediction network;
  • the second branch includes a second feature extraction network and a local density prediction network.
  • the decoder inputs the geometric information of the first-scale point cloud into the first branch and the second branch respectively, and performs feature extraction and network prediction on the geometric information of the first-scale point cloud through the first branch and the second branch respectively; the occupancy probability corresponding to each second-scale voxel is output through the first branch, and the local density corresponding to each first-scale voxel is output through the second branch.
  • the second-scale point cloud features output by the upsampling network may be further extracted, and the second-scale point cloud features after further feature extraction may be input into the occupancy probability prediction network for occupancy probability prediction to enhance feature expression, thereby improving the accuracy of occupancy probability prediction.
  • the specific selection is made according to the actual situation, and the embodiments of the present application are not limited thereto.
  • S102 in FIG. 3 may also be implemented by executing the process of S401-S403 as follows:
  • S401 extracting features from geometric information of a first-scale point cloud to determine features of the first-scale point cloud.
  • S402 up-sample the first-scale point cloud features to a second scale, determine the second-scale point cloud features, and perform occupation probability prediction based on the second-scale point cloud features to determine the occupation probability corresponding to the second-scale voxels.
  • the occupancy probability prediction and the local density prediction can be performed based on the first scale point cloud features determined by feature extraction of the geometric information of the first scale point cloud.
  • the feature information can be reused, the module complexity can be reduced, and the processing efficiency of the decoder can be improved.
  • the embodiment of the present application does not limit the execution order of S402 and S403.
  • the above S401-S403 process can be shown in FIG8.
  • the decoder extracts features from the geometric information of the first-scale point cloud through the fifth feature extraction network to determine the first-scale point cloud features; the first-scale point cloud features are respectively input into the first branch including the upsampling network and the occupancy probability prediction network, and the second branch including the local density prediction network; the occupancy probability corresponding to each second-scale voxel is output through the first branch, and the local density corresponding to each first-scale voxel is output through the second branch.
  • the fifth feature extraction network in FIG8 and the third feature extraction network in FIG6 can be the same or different feature extraction networks.
  • a local density prediction network can be used to perform local density prediction based on the first-scale point cloud features to determine the local density corresponding to the first-scale voxel.
  • the local density prediction network may include: a first sparse convolution layer, a first activation function layer, a second sparse convolution layer, and a second activation function layer.
  • the local density prediction network may be shown in FIG9, including: a first layer of sparse convolution layer (i.e., a first sparse convolution layer), a second layer of ReLu activation function layer (i.e., a first activation function layer), a third layer of sparse convolution layer (i.e., a second sparse convolution layer), and a fourth layer of Sigmoid function layer (i.e., a second activation function layer).
  • the local density prediction network is used to output local density according to implicit features of the point cloud.
  • the feature extraction process in S201, S202, S301, S303, S401, and S402 can be implemented using a feature extraction network as shown in FIG10, including: a sparse convolution layer of the first layer, an activation function layer of the second layer (e.g., a ReLu activation function layer), a residual layer of the third layer, and a sparse convolution layer of the fourth layer.
  • the network structure of the residual layer of the third layer can be as shown in FIG11.
  • the occupancy probability prediction process in S202, S302, and S402 can be implemented using an occupancy probability prediction network as shown in FIG12, including: a sparse convolution layer of the first layer, an activation function ReLu layer of the second layer, a sparse convolution layer of the third layer, an activation function ReLu of the fourth layer, a sparse convolution layer of the fifth layer, and a Sigmoid function layer of the sixth layer.
  • the occupancy probability prediction network is used to perform occupancy prediction on each second-scale voxel obtained by upsampling each first-scale voxel according to the implicit features of the point cloud of the second scale, and obtain the occupancy probability corresponding to each second-scale voxel.
  • the occupancy probability prediction network in FIG12 is only an example. In actual applications, it can also be a convolutional neural network (CNN) of other hierarchical structures. The specific selection is made according to the actual situation, and the embodiments of the present application are not limited thereto.
  • CNN convolutional neural network
  • the multiple high-scale second-scale voxels obtained by upsampling are also unoccupied, and no probability prediction is required.
  • the multiple second-scale voxels obtained by upsampling the occupied first-scale voxels in the first-scale point cloud need to be predicted.
  • the occupancy probability corresponding to the second-scale voxels facing the side of the paper in the second-scale point cloud in Figure 4 can be shown in Figure 13.
  • the occupancy probability in Figure 13 is only for convenience of explanation and cannot be understood as the result of actual calculation.
  • the decoder uses the predicted local density corresponding to each first-scale voxel in the first-scale point cloud and the occupancy probability corresponding to each second-scale voxel obtained by upsampling each first-scale voxel to determine whether each second-scale voxel is occupied, and then, according to whether each second-scale voxel is occupied, decodes and reconstructs the encoded information corresponding to the second-scale point cloud, decodes the position information corresponding to each occupied second-scale voxel in the second-scale point cloud, and determines the reconstructed geometric information corresponding to the second-scale point cloud according to the position information corresponding to each occupied second-scale voxel in the second-scale point cloud.
  • S103 may be implemented by executing the process of S1031 - S1032 as follows:
  • the multiple occupation probabilities of the multiple second scale voxels corresponding to each first scale voxel can be screened according to the local density corresponding to each first scale voxel, and the local density second scale voxels with high occupation probability are determined as the occupied second scale voxels.
  • the local density corresponding to the first-scale voxel can be any value from 1 to 8.
  • the decoder determines the 4 second-scale voxels with high occupancy probability among the 8 second-scale voxels as occupied second-scale voxels, and the other second-scale voxels are determined as unoccupied second-scale voxels.
  • a first-scale voxel in the first-scale point cloud is upsampled to obtain 8 second-scale voxels, and the local density corresponding to the first-scale voxel is 4; the occupancy probabilities corresponding to the 8 second-scale voxels are, from high to low, [0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2]. Then, according to the local density, the second-scale voxels with occupancy probabilities of [0.9, 0.8, 0.7, 0.6] can be determined as occupied second-scale voxels; and the other second-scale voxels corresponding to [0.5, 0.4, 0.3, 0.2] are determined as unoccupied second-scale voxels.
  • the occupied second-scale voxels represent points in the point cloud that fall into the second-scale voxels obtained by upsampling the first-scale voxels.
  • the decoder can decode and reconstruct the encoded information corresponding to the second-scale point cloud based on the occupied second-scale voxels corresponding to each first-scale voxel, thereby determining the reconstructed geometric information corresponding to the second-scale point cloud.
  • the decoder may distinguish between occupied second-scale voxels and unoccupied second-scale voxels with different occupancy symbols according to the determination result of S1031.
  • the encoded information corresponding to the second-scale point cloud may be decoded and reconstructed based on each occupied second-scale voxel represented by the occupancy symbol, and the reconstructed geometric information corresponding to the second-scale point cloud may be determined.
  • the decoder when it performs geometric reconstruction, it mainly decodes and reconstructs the geometric coding information in the coding information corresponding to the second scale point cloud. After determining the reconstructed geometric information corresponding to the second scale point cloud, the attribute coding information corresponding to the second scale point cloud can be decoded based on the reconstructed geometric information corresponding to the second scale point cloud through the attribute decoding process to determine the attribute information corresponding to the second scale point cloud, thereby combining the reconstructed geometric information and the attribute information corresponding to the second scale point cloud to restore the three-dimensional image model of the second scale point cloud.
  • the decoder can determine the number of occupied second-scale voxels in the second-scale voxels obtained by sampling each first-scale voxel by predicting the local density. In this way, the occupancy probability corresponding to the second-scale voxels can be screened in combination with the local density to determine the occupancy of the second-scale voxels, reconstruct the second-scale point cloud according to the occupancy of the second-scale voxels, and determine the reconstructed geometric information of the second-scale point cloud.
  • the occupancy of the determined second-scale voxels can be made more accurate, and the accuracy of the reconstructed geometric information of the second-scale point cloud can be improved, that is, the reconstructed geometric quality of the decoder is improved, and the decoding performance is improved.
  • the decoder may continue to parse the code stream to determine the encoding information corresponding to the i-th scale point cloud, where i is an integer greater than or equal to 3; through the above decoding method in the embodiment of the present application, the local density of the i-th scale voxel is predicted based on the reconstructed geometric information of the i-th scale point cloud, the local density corresponding to the i-1 scale voxel is determined, and the occupancy probability of the i-th scale voxel corresponding to the i-1 scale voxel is predicted to determine the occupancy probability corresponding to the i-th scale voxel; the i-th scale is obtained by upsampling the i-th scale; based on the occupancy probability corresponding to the i-th scale voxel and the local density corresponding to the i-th scale voxel, the encoding information corresponding to the i-th scale point cloud is decoded and
  • the decoder may also decode and reconstruct the encoding information of a higher scale point cloud in the code stream, such as a third scale point cloud, based on the reconstructed geometric information of the second scale point cloud, using the decoding method in the embodiment of the present application to obtain the reconstructed geometric information corresponding to the third scale point cloud.
  • a higher scale point cloud in the code stream such as a third scale point cloud
  • the decoder when the decoder decodes adjacent scales, it can use the known geometric data of the previously decoded low-scale point cloud to decode and reconstruct the encoded information of the high-scale point cloud, and determine the reconstructed geometric information of the high-scale point cloud; by decoding scale by scale until the reconstructed geometric information of the target scale point cloud is restored, the target scale can be determined according to the preset decoding accuracy of the decoder.
  • the decoder may also perform the above-mentioned decoding method in the embodiments of the present application to predict the local density of the n-th scale voxel and the occupancy probability of the n+1-th scale voxel based on the reconstructed geometric data corresponding to the n-th scale point cloud, and determine the local density corresponding to the n-th scale voxel and the occupancy probability corresponding to the n+1-th scale voxel; n is a positive integer greater than or equal to 2; the n+1-th scale is obtained by sampling the n-th scale; based on the local density corresponding to the n-th scale voxel and the occupancy probability corresponding to the n+1-th scale voxel, determine the reconstructed geometric data corresponding to the n+1-th scale point cloud.
  • the decoder can perform voxel upsampling based on the first-scale voxels in the first-scale point cloud to obtain the second-scale voxels on the basis of decoding and reconstructing the first-scale point cloud; and extract features from the geometric information of the first-scale point cloud, perform local density prediction on the first-scale voxels, and perform occupation probability prediction on the second-scale voxels, and reconstruct the reconstructed geometric data corresponding to the second-scale point cloud according to the determined local density of the first-scale voxels and the occupation probability of the second-scale voxels, combined with the second position information of the second-scale voxels determined by upsampling the first position information of the first-scale
  • the decoding method in the embodiment of the present application can be applied to a scalable encoding and decoding method, that is, for multiple encoding information of multiple scale point clouds sent by the encoder side, the decoder can decode and reconstruct the point cloud of any scale in the order of decoding from low scale to high scale according to the actual decoding accuracy requirements.
  • the encoder writes and sends the encoding information of the first scale point cloud, the encoding information of the second scale point cloud to the encoding information of the fifth scale point cloud in the code stream, and the decoder can decode from the first scale point cloud to the third scale point cloud according to the preset accuracy requirements according to the decoding method provided in the embodiment of the present application, reconstruct the geometric data of the third scale point cloud and restore the three-dimensional image model of the third scale point cloud, and then end the decoding, and no longer continue to decode the encoding information corresponding to the fifth scale point cloud.
  • the specific selection is based on the actual situation, and the embodiment of the present application is not limited.
  • the decoding method provided in the embodiment of the present application can be repeatedly applied between multiple adjacent scales, and the decoding between each group of adjacent scales is independent of each other, so scale-scalable decoding can be flexibly implemented.
  • each decoding process of the above decoder uses the decoded low-scale point cloud as known information to decode the encoded information of the high-scale point cloud.
  • the known information may be a preset number of unencoded point cloud information sent by the encoder side.
  • the encoder may send a preset number of point cloud information, such as the coordinates of 100 points in the point cloud, as the first known information directly to the decoding end in an unencoded manner, so that the decoder does not need to decode the first known information, but directly uses the position information of the preset number of points sent by the encoder to reconstruct the point cloud of the corresponding scale to continue the subsequent decoding process.
  • Figure 16 is an optional flow chart of the encoding method provided in an embodiment of the present application, which will be explained in conjunction with the steps shown in Figure 16.
  • the encoder voxelizes the original point cloud data of the second scale point cloud, and the voxelized point cloud can represent the geometric data of the point cloud by the occupation symbol of the voxel at each position in the voxel grid.
  • the encoder performs voxel downsampling on the voxelized second scale point cloud to determine the first scale point cloud.
  • the encoder can implement voxel downsampling by pooling. As shown in FIG17, a maximum pooling layer with a step size of 2 ⁇ 2 ⁇ 2 is used to merge 8 second-scale voxels into 1 first-scale voxel. After voxel downsampling, 3 of the 4 first-scale voxels corresponding to the first-scale point cloud are occupied, and 1 first-scale voxel is not occupied. The size of the first-scale voxel in three dimensions is twice that of the second-scale voxel. The encoder marks the occupancy of the voxel with an occupancy symbol.
  • the occupancy symbol corresponding to the second-scale voxel on the side of the second-scale point cloud facing the paper surface is obtained by voxel upsampling.
  • the process of the occupancy symbol corresponding to the first-scale voxel on the side of the first-scale point cloud facing the paper surface can be shown in FIG18. In this way, by voxel downsampling, the geometric data of the first-scale point cloud of relatively low scale is obtained.
  • the encoder then upsamples the first-scale point cloud to the second scale, determines a plurality of second-scale voxels corresponding to each first-scale voxel, and implements a lossless encoding process by predicting the occupancy of the second-scale voxels.
  • the local density represents the number of occupied second-scale voxels in the second-scale voxels corresponding to the first-scale voxels.
  • the encoder performs local density prediction based on the first-scale point cloud, determines the local density corresponding to the first-scale voxels, and predicts the occupancy probability of the second-scale voxels.
  • the process of determining the occupancy probability corresponding to the second-scale voxels is the same as the same processing method in the decoder, and will not be repeated here.
  • the process of S502 may include: performing feature extraction on the geometric information of the first-scale point cloud to determine the first-scale point cloud features; upsampling the first-scale point cloud features to the second scale to determine the initial second-scale point cloud features, performing feature extraction on the initial second-scale point cloud features to determine the second-scale point cloud features, and performing occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel; performing local density prediction based on the first-scale point cloud features to determine the local density corresponding to the first-scale voxel.
  • the process of S502 may also include: performing feature extraction on the geometric information of the first-scale point cloud through a first feature extraction network to determine the first point cloud features of the first scale; upsampling the first point cloud features of the first scale to a second scale to determine the second-scale point cloud features, and performing occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel; performing feature extraction on the geometric data of the first-scale point cloud through a second feature extraction network to determine the second point cloud features of the first scale; performing local density prediction based on the second point cloud features of the first scale to determine the local density corresponding to the first-scale voxel.
  • the process of S502 may also include: extracting features from the geometric information of the first-scale point cloud to determine the first-scale point cloud features; upsampling the first-scale point cloud features to the second scale to determine the second-scale point cloud features, and performing occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel; performing local density prediction based on the first-scale point cloud features to determine the local density corresponding to the first-scale voxel.
  • the encoder can use the local density prediction network to perform local density prediction according to the first-scale point cloud features to determine the local density corresponding to the first-scale voxel; wherein the local density prediction network includes: a first sparse convolution layer, a first activation function layer, a second sparse convolution layer and a second activation function layer.
  • S503 Determine the reconstructed geometric information corresponding to the second-scale point cloud according to the local density corresponding to the first-scale voxel and the occupancy probability corresponding to the second-scale voxel.
  • the encoder determines the reconstructed geometric information corresponding to the second-scale point cloud based on the local density corresponding to the first-scale voxel and the occupancy probability corresponding to the second-scale voxel, which is consistent with the description of the same processing process of the decoder and is not repeated here.
  • S503 may be implemented by executing the process of S5031-S5032 as follows:
  • S5032 Determine reconstructed geometric information corresponding to the second-scale point cloud based on the occupied second-scale voxels corresponding to each first-scale voxel.
  • encoding is performed based on the reconstructed geometric information corresponding to the second-scale point cloud, encoding information corresponding to the second-scale point cloud is determined, and the encoding information is written into a bitstream.
  • the encoder can perform recoloring processing based on the reconstructed geometric information corresponding to the second-scale point cloud to obtain colored point cloud data, perform color information encoding based on the colored point cloud data, and determine the attribute information encoding corresponding to the second-scale point cloud; perform geometric information encoding on the point cloud data of the second-scale point cloud to determine the geometric information encoding corresponding to the second-scale point cloud; and combine the geometric information encoding and the attribute information encoding to determine the encoding information corresponding to the second-scale point cloud.
  • entropy coding may use a context-based adaptive binary arithmetic coding (CABAC: Context-based Adaptive Binary Arithmetic Coding) algorithm, but is not limited thereto.
  • CABAC Context-based Adaptive Binary Arithmetic Coding
  • the encoder writes the code corresponding to the second-scale point cloud into the bitstream and sends it to the decoder, which parses the encoding information corresponding to the second-scale point cloud.
  • the decoding method provided in the embodiment of the present application is used to send the geometric data of the previously decoded low-scale point cloud (such as the first-scale point cloud) to the entropy decoder, so that the geometric data of the lossless second-scale point cloud can be reconstructed, that is, the reconstructed geometric information corresponding to the second-scale point cloud is determined, and then the geometric decoding and attribute decoding of the encoding information of the second-scale point cloud are realized based on the reconstructed geometric information corresponding to the second-scale point cloud, and the three-dimensional image model of the second-scale point cloud is restored.
  • the geometric data of the previously decoded low-scale point cloud such as the first-scale point cloud
  • the occupancy probability corresponding to the second-scale voxels is screened, which can improve the accuracy of determining the occupancy status of the second-scale voxels, and then improve the accuracy of encoding the reconstructed geometric information of the second-scale point cloud determined based on the occupancy status of the second-scale voxels, thereby improving the encoding performance.
  • the encoder can also perform at least one voxel downsampling based on the first-scale point cloud through the same encoding process, determine a point cloud with a lower scale than the first-scale point cloud, and complete the encoding of multiple-scale point clouds scale by scale.
  • the encoder writes the encoding information of the multiple-scale point clouds into the bitstream and sends it to the decoder.
  • the encoding method provided in the embodiment of the present application can be repeatedly applied between multiple adjacent scales, and the encoding between each group of adjacent scales is independent of each other, so scale-scalable encoding can be flexibly implemented.
  • the encoder performs G-PCC encoding on the original point cloud data of the first scale point cloud to obtain the encoding information corresponding to the first scale point cloud, and sends it to the decoder through the code stream.
  • the decoder performs G-PCC decoding on the encoding information corresponding to the first scale point cloud data, and determines the reconstructed geometric information of the first scale point cloud through the geometric decoding process in G-PCC decoding.
  • the geometric decoding process in G-PCC decoding is used to illustrate that the decoder extracts features from the reconstructed geometric information of the first scale point cloud to obtain first scale point cloud features, and upsamples the first scale point cloud features to the second scale by a 2 ⁇ 2 ⁇ 2 transposed convolution to determine the initial second scale point cloud features; the decoder extracts features from the initial second scale point cloud features to determine the second scale point cloud features.
  • the decoder predicts the occupancy probability of multiple second scale voxels obtained by upsampling each first scale voxel in the first scale point cloud based on the second scale point cloud features, and obtains multiple occupancy probabilities corresponding to the multiple second scale voxels; and predicts the local density of each first scale voxel based on the first scale point cloud features to obtain the local density corresponding to each first scale voxel.
  • the decoder screens the multiple occupancy probabilities corresponding to the multiple second-scale voxels sampled from the first-scale voxel according to the local density corresponding to each first-scale voxel; illustratively, the multiple occupancy probabilities are sorted from high to low, and the local density occupancy probabilities with high occupancy probabilities are determined, and the local density occupancy probabilities corresponding to the local density occupancy probabilities with high occupancy probabilities are determined as the local density occupies the second-scale voxels.
  • the reconstructed geometric information of the second-scale point cloud is determined according to the local density occupies the second-scale voxels corresponding to each first-scale voxel in the first-scale point cloud. Based on the reconstructed geometric information of the second-scale point cloud, the reconstructed geometric information of the third-scale point cloud is reconstructed by the same process.
  • the embodiment of the present application has good geometric reconstruction quality and better encoding and decoding performance than the traditional G-PCC method.
  • the applicant conducted a comparative encoding and decoding test on multiple point cloud data sets using the encoding and decoding method of the embodiment of the present application and the traditional G-PCC method. The results are shown in Table 1, as follows:
  • Point cloud dataset The rate-distortion performance gain of the embodiment of the present application compared to the traditional G-PCC facade_00009_vox12 -25.91% house_without_roof_00057_vox12 -43.11% boxer_viewdep_vox12 -42.76% soldier_viewdep_vox12 -45.60% Average gain -39.35%
  • facade_00009_vox12, house_without_roof_00057_vox12, boxer_viewdep_vox12 and soldier_viewdep_vox12 are different point cloud data sets, and BD-rate gain over.
  • the rate-distortion (Bjontegaard-Delta, BD-rate) performance gain value is negatively correlated with the codec performance.
  • the embodiment of the present application has a higher BD-rate gain, with a maximum increase of 45.60% and an average increase of 39.35. This data illustrates the improvement of codec performance.
  • the embodiment of the present application provides a decoder 1, as shown in FIG20 , including:
  • the parsing part 11 is configured to parse the bit stream and determine the encoding information corresponding to the second scale point cloud;
  • the determining part 12 is configured to determine a first scale point cloud; the first scale point cloud is the previously decoded point cloud data corresponding to the second scale point cloud;
  • the first prediction part 13 is configured to perform local density prediction based on the first scale point cloud, determine the local density corresponding to the first scale voxel in the first scale point cloud, and perform occupation probability prediction on the second scale voxel to determine the occupation probability corresponding to the second scale voxel;
  • the second scale voxel is an upsampled voxel corresponding to the first scale voxel;
  • the local density represents the number of occupied second scale voxels in the second scale voxel corresponding to the first scale voxel;
  • the decoding and reconstruction part 14 is configured to decode and reconstruct the encoded information corresponding to the second scale point cloud based on the occupancy probability corresponding to the second scale voxel and the local density corresponding to the first scale voxel, and determine the reconstructed geometric information corresponding to the second scale point cloud.
  • the decoding and reconstruction part 14 is further configured to, for each first-scale voxel in the first-scale point cloud, determine the second-scale voxels with high local density of occupation probability among the multiple second-scale voxels corresponding to each first-scale voxel as occupied second-scale voxels; based on the occupied second-scale voxels corresponding to each first-scale voxel, decode and reconstruct the encoded information corresponding to the second-scale point cloud to determine the reconstructed geometric information corresponding to the second-scale point cloud.
  • the first prediction part 13 is further configured to perform feature extraction on the geometric information of the first-scale point cloud to determine the first-scale point cloud features; upsample the first-scale point cloud features to the second scale to determine the initial second-scale point cloud features, perform feature extraction on the initial second-scale point cloud features to determine the second-scale point cloud features, and perform occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel; perform local density prediction based on the first-scale point cloud features to determine the local density corresponding to the first-scale voxel.
  • the first prediction part 13 is further configured to perform feature extraction on the geometric information of the first-scale point cloud through a first feature extraction network to determine the first point cloud features of the first scale; upsample the first point cloud features of the first scale to a second scale to determine the second-scale point cloud features, and perform occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel; perform feature extraction on the geometric data of the first-scale point cloud through a second feature extraction network to determine the second point cloud features of the first scale; perform local density prediction based on the second point cloud features of the first scale to determine the local density corresponding to the first-scale voxel.
  • the first prediction part 13 is further configured to perform feature extraction on the geometric information of the first-scale point cloud to determine the first-scale point cloud features; upsample the first-scale point cloud features to the second scale to determine the second-scale point cloud features, and perform occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel; perform local density prediction based on the first-scale point cloud features to determine the local density corresponding to the first-scale voxel.
  • the first prediction part 13 is further configured to use a local density prediction network to perform local density prediction according to the first-scale point cloud features to determine the local density corresponding to the first-scale voxel.
  • the local density prediction network includes:
  • the first sparse convolution layer the first activation function layer, the second sparse convolution layer and the second activation function layer.
  • the parsing part 11 is further configured to parse the code stream to determine the encoding information corresponding to the i-th scale point cloud; i is an integer greater than or equal to 3; the first prediction part 13 is further configured to perform local density prediction of the i-1th scale voxel based on the reconstructed geometric information of the i-1th scale point cloud, determine the local density corresponding to the i-1 scale voxel, and perform occupancy probability prediction on the i-th scale voxel corresponding to the i-1 scale voxel to determine the occupancy probability corresponding to the i-th scale voxel; the i-th scale is obtained by sampling the i-1th scale; the decoding and reconstruction part 14 is further configured to decode and reconstruct the encoding information corresponding to the i-th scale point cloud based on the occupancy probability corresponding to the i-th scale voxel and the local density corresponding to the i-1th scale voxel, and determine the
  • the first prediction part 13 is further configured to perform local density prediction of the n-th scale voxel and occupancy probability prediction of the n+1-th scale voxel based on the reconstructed geometric data corresponding to the n-th scale point cloud, and determine the local density corresponding to the n-th scale voxel and the occupancy probability corresponding to the n+1-th scale voxel; n is a positive integer greater than or equal to 2; the n+1-th scale is obtained by upsampling the n-th scale; the decoding and reconstruction part 14 is further configured to determine the reconstructed geometric data corresponding to the n+1-th scale point cloud based on the local density corresponding to the n-th scale voxel and the occupancy probability corresponding to the n+1-th scale voxel.
  • the embodiment of the present application provides an encoder 2, as shown in FIG21, including:
  • a downsampling part 21 is configured to perform voxel downsampling on the second scale point cloud to determine a first scale point cloud;
  • the second prediction part 22 is configured to perform local density prediction based on the first scale point cloud to determine the local density corresponding to the first scale voxel, and perform occupation probability prediction on the second scale voxel to determine the occupation probability corresponding to the second scale voxel; the local density represents the number of occupied second scale voxels in the second scale voxels corresponding to the first scale voxels;
  • a reconstruction part 23 configured to determine the reconstructed geometric information corresponding to the second-scale point cloud according to the local density corresponding to the first-scale voxel and the occupancy probability corresponding to the second-scale voxel;
  • the encoding part 24 is configured to perform encoding based on the reconstructed geometric information corresponding to the second-scale point cloud, determine the encoding information corresponding to the second-scale point cloud, and write the encoding information into a bitstream.
  • the reconstruction part 23 is further configured to, for each first-scale voxel in the first-scale point cloud, determine a number of second-scale voxels with a high local density of occupation probability among multiple second-scale voxels corresponding to each first-scale voxel as occupied second-scale voxels; and determine the reconstructed geometric information corresponding to the second-scale point cloud based on the occupied second-scale voxels corresponding to each first-scale voxel.
  • the second prediction part 22 is further configured to perform feature extraction on the geometric information of the first-scale point cloud to determine the first-scale point cloud features; upsample the first-scale point cloud features to the second scale to determine the initial second-scale point cloud features, perform feature extraction on the initial second-scale point cloud features to determine the second-scale point cloud features, and perform occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel; perform local density prediction based on the first-scale point cloud features to determine the local density corresponding to the first-scale voxel.
  • the second prediction part 22 is further configured to perform feature extraction on the geometric information of the first-scale point cloud through a first feature extraction network to determine the first point cloud features of the first scale; upsample the first point cloud features of the first scale to a second scale to determine the second-scale point cloud features, and perform occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel; perform feature extraction on the geometric data of the first-scale point cloud through a second feature extraction network to determine the second point cloud features of the first scale; perform local density prediction based on the second point cloud features of the first scale to determine the local density corresponding to the first-scale voxel.
  • the second prediction part 22 is further configured to perform feature extraction on the geometric information of the first-scale point cloud to determine the first-scale point cloud features; upsample the first-scale point cloud features to the second scale to determine the second-scale point cloud features, and perform occupancy probability prediction based on the second-scale point cloud features to determine the occupancy probability corresponding to the second-scale voxel; perform local density prediction based on the first-scale point cloud features to determine the local density corresponding to the first-scale voxel.
  • the second prediction part 22 is further configured to use a local density prediction network to perform local density prediction based on the first-scale point cloud features to determine the local density corresponding to the first-scale voxel; wherein the local density prediction network includes: a first sparse convolution layer, a first activation function layer, a second sparse convolution layer and a second activation function layer.
  • the encoding part 24 is further configured to perform recoloring processing based on the reconstructed geometric information corresponding to the second-scale point cloud to obtain colored point cloud data, perform color information encoding based on the colored point cloud data, and determine the attribute information encoding corresponding to the second-scale point cloud; perform geometric information encoding on the point cloud data of the second-scale point cloud to determine the geometric information encoding corresponding to the second-scale point cloud; and combine the geometric information encoding and the attribute information encoding to determine the encoding information corresponding to the second-scale point cloud.
  • the embodiment of the present application further provides a decoder
  • FIG22 is an optional structural diagram of the decoder 3 provided in the embodiment of the present application.
  • the decoder 3 includes: a first memory 32 and a first processor 33.
  • the first memory 32 and the first processor 33 are connected through a first communication bus 34;
  • the first memory 32 is used to store executable instructions;
  • the first processor 33 is used to execute the executable instructions stored in the first memory 32, and implement the decoding method provided in the embodiment of the present application.
  • the embodiment of the present application further provides an encoder
  • FIG23 is an optional structural diagram of the encoder 4 provided in the embodiment of the present application.
  • the encoder 4 includes: a second memory 42 and a second processor 43.
  • the second memory 42 and the second processor 43 are connected via a second communication bus 44;
  • the second memory 42 is used to store executable instructions;
  • the second processor 43 is used to execute the executable instructions stored in the second memory 42, and implement the encoding method provided in the embodiment of the present application.
  • An embodiment of the present application provides a computer-readable storage medium storing executable instructions, wherein executable instructions are stored.
  • the first processor When the executable instructions are executed by a first processor, the first processor will be caused to execute any one of the decoding methods provided in the embodiments of the present application; or, when the executable instructions are executed by a second processor, the second processor will be caused to execute any one of the encoding methods provided in the embodiments of the present application.
  • the computer-readable storage medium may be a memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface storage, optical disk, or CD-ROM; or it may be various devices including one or any combination of the above memories.
  • executable instructions may be in the form of a program, software, software module, script or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine or other unit suitable for use in a computing environment.
  • executable instructions may, but do not necessarily, correspond to a file in a file system, may be stored as part of a file that stores other programs or data, such as, for example, in one or more scripts in a Hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files storing one or more modules, subroutines, or code portions).
  • HTML Hypertext Markup Language
  • executable instructions may be deployed to be executed on one computing device, or on multiple computing devices located at one site, or on multiple computing devices distributed across multiple sites and interconnected by a communication network.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may adopt the form of hardware embodiments, software embodiments, or embodiments in combination with software and hardware. Moreover, the present application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) that contain computer-usable program code.
  • a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) that contain computer-usable program code.
  • each flow process and/or box in the flow chart and/or block diagram and the combination of the flow process and/or box in the flow chart and/or block diagram can be realized by computer program instructions.
  • These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processing machine or other programmable data processing device to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing device produce a device for realizing the function specified in one flow chart or multiple flows and/or one box or multiple boxes of the block chart.
  • These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
  • the embodiment of the present application provides a coding and decoding method, a decoder, an encoder and a computer-readable storage medium.
  • the decoder can determine the number of occupied second-scale voxels in the second-scale voxels obtained by sampling each first-scale voxel by predicting the local density. In this way, the occupancy probability corresponding to the second-scale voxel can be screened in combination with the local density to determine the occupancy of the second-scale voxel, reconstruct the second-scale point cloud according to the occupancy of the second-scale voxel, and determine the reconstructed geometric information of the second-scale point cloud.
  • the occupancy of the determined second-scale voxel can be made more accurate, the accuracy of the reconstructed geometric information of the second-scale point cloud is improved, the reconstructed geometric quality of the decoder is improved, and the decoding performance is improved.
  • the occupancy probability corresponding to the second-scale voxel is screened, which can improve the accuracy of determining the occupancy of the second-scale voxel, thereby improving the accuracy of encoding the reconstructed geometric information of the second-scale point cloud determined based on the occupancy of the second-scale voxel, thereby improving the encoding performance.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)

Abstract

Des modes de réalisation de la présente demande concernent un procédé de codage, un procédé de décodage, un décodeur, un codeur et un support de stockage lisible par ordinateur, qui peuvent améliorer les performances de codage et de décodage. Le procédé de décodage comprend : l'analyse d'un flux de code pour déterminer des informations codées correspondant à un nuage de points de seconde échelle, et la détermination d'un nuage de points de première échelle ; la réalisation d'une prédiction de densité locale sur la base du nuage de points de première échelle pour déterminer une densité locale correspondant à des voxels de première échelle dans le nuage de points de première échelle, et la réalisation d'une prédiction de probabilité d'occupation sur des voxels de seconde échelle pour déterminer une probabilité d'occupation correspondant aux voxels de seconde échelle, les voxels de seconde échelle étant des voxels de suréchantillonnage correspondant aux voxels de première échelle, et la densité locale représentant le nombre de voxels de seconde échelle occupés parmi les voxels de seconde échelle correspondant aux voxels de première échelle ; et, sur la base de la probabilité d'occupation correspondant aux voxels de seconde échelle et de la densité locale correspondant aux voxels de première échelle, décoder et reconstruire les informations codées correspondant au nuage de points de seconde échelle, de façon à déterminer des informations géométriques reconstruites correspondant au nuage de points de seconde échelle.
PCT/CN2022/125742 2022-10-17 2022-10-17 Procédé de codage, procédé de décodage, décodeur, codeur et support de stockage lisible par ordinateur WO2024082105A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/125742 WO2024082105A1 (fr) 2022-10-17 2022-10-17 Procédé de codage, procédé de décodage, décodeur, codeur et support de stockage lisible par ordinateur

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/125742 WO2024082105A1 (fr) 2022-10-17 2022-10-17 Procédé de codage, procédé de décodage, décodeur, codeur et support de stockage lisible par ordinateur

Publications (1)

Publication Number Publication Date
WO2024082105A1 true WO2024082105A1 (fr) 2024-04-25

Family

ID=90736577

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/125742 WO2024082105A1 (fr) 2022-10-17 2022-10-17 Procédé de codage, procédé de décodage, décodeur, codeur et support de stockage lisible par ordinateur

Country Status (1)

Country Link
WO (1) WO2024082105A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113613010A (zh) * 2021-07-07 2021-11-05 南京大学 基于稀疏卷积神经网络的点云几何无损压缩方法
CN113766228A (zh) * 2020-06-05 2021-12-07 Oppo广东移动通信有限公司 点云压缩方法、编码器、解码器及存储介质
US20220108492A1 (en) * 2020-10-06 2022-04-07 Qualcomm Incorporated Gpcc planar mode and buffer simplification
CN114926636A (zh) * 2022-05-12 2022-08-19 合众新能源汽车有限公司 一种点云语义分割方法、装置、设备及存储介质
US20220321912A1 (en) * 2019-08-09 2022-10-06 Lg Electronics Inc. Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220321912A1 (en) * 2019-08-09 2022-10-06 Lg Electronics Inc. Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
CN113766228A (zh) * 2020-06-05 2021-12-07 Oppo广东移动通信有限公司 点云压缩方法、编码器、解码器及存储介质
US20220108492A1 (en) * 2020-10-06 2022-04-07 Qualcomm Incorporated Gpcc planar mode and buffer simplification
CN113613010A (zh) * 2021-07-07 2021-11-05 南京大学 基于稀疏卷积神经网络的点云几何无损压缩方法
CN114926636A (zh) * 2022-05-12 2022-08-19 合众新能源汽车有限公司 一种点云语义分割方法、装置、设备及存储介质

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HE YUN; REN XINLIN; TANG DANHANG; ZHANG YINDA; XUE XIANGYANG; FU YANWEI: "Density-preserving Deep Point Cloud Compression", 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, 18 June 2022 (2022-06-18), pages 2323 - 2332, XP034194523, DOI: 10.1109/CVPR52688.2022.00237 *
J. WANG, Z. MA (NANJING UNIVERSITY), H. WEI (OPPO), Y. YU (OPPO), V. ZAKHARCHENKO (OPPO), D. WANG(OPPO): "[G-PCC EE13.54] A Geometry Compression Framework for AI-based PCC via Sparse Convolution", 135. MPEG MEETING; 20210712 - 20210716; ONLINE; (MOTION PICTURE EXPERT GROUP OR ISO/IEC JTC1/SC29/WG11), 6 July 2021 (2021-07-06), XP030297111 *

Similar Documents

Publication Publication Date Title
CN113615181B (zh) 用于点云编解码的方法、装置
US9734595B2 (en) Method and apparatus for near-lossless compression and decompression of 3D meshes and point clouds
WO2021000658A1 (fr) Procédé de codage et de décodage de nuage de points, codeur, décodeur et support de stockage informatique
JP7233561B2 (ja) 点群圧縮のための方法並びにその、装置およびコンピュータプログラム
CN113613010A (zh) 基于稀疏卷积神经网络的点云几何无损压缩方法
WO2022121649A1 (fr) Procédé de codage et de décodage de données de nuage de points, procédé et appareil de traitement de données de nuage de points, dispositif électronique, produit-programme d'ordinateur et support de stockage lisible par ordinateur
JP7408799B2 (ja) ニューラルネットワークモデルの圧縮
JP7486883B2 (ja) Haarベースの点群符号化のための方法および装置
CN111641826B (zh) 对数据进行编码、解码的方法、装置与系统
CN111727445A (zh) 局部熵编码的数据压缩
KR102650334B1 (ko) 포인트 클라우드 코딩을 위한 방법 및 장치
WO2024082105A1 (fr) Procédé de codage, procédé de décodage, décodeur, codeur et support de stockage lisible par ordinateur
TW202406344A (zh) 一種點雲幾何資料增強、編解碼方法、裝置、碼流、編解碼器、系統和儲存媒介
US20230086264A1 (en) Decoding method, encoding method, decoder, and encoder based on point cloud attribute prediction
CN113382244B (zh) 编解码网络结构、图像压缩方法、装置及存储介质
JP7394980B2 (ja) ブロック分割を伴うニューラルネットワークを復号する方法、装置及びプログラム
CN115393452A (zh) 一种基于非对称自编码器结构的点云几何压缩方法
WO2024011417A1 (fr) Procédé de codage, procédé de décodage, décodeur, codeur et support de stockage lisible par ordinateur
WO2024082101A1 (fr) Procédé de codage, procédé de décodage, décodeur, codeur, flux de code et support de stockage
RU2778864C1 (ru) Неявное геометрическое разделение на основе квадродерева или бинарного дерева для кодирования облака точек
WO2024082152A1 (fr) Procédés et appareils de codage et de décodage, codeur et décodeur, flux de code, dispositif et support de stockage
WO2023205969A1 (fr) Procédé et appareil de compression d'informations géométriques de nuage de points, procédé et appareil de décompression d'informations géométriques de nuage de points, procédé et appareil de codage vidéo de nuage de points et procédé et appareil de décodage vidéo de nuage de points
WO2024011427A1 (fr) Procédé et appareil de compensation inter-trame de nuage de points, procédé et appareil de codage de nuage de points, procédé et appareil de décodage de nuage de points, et système
WO2023181872A1 (fr) Dispositif et procédé de traitement d'informations
US20240087176A1 (en) Point cloud decoding method and apparatus, point cloud encoding method and apparatus, computer device, computer-readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22962303

Country of ref document: EP

Kind code of ref document: A1