CN115393452A

CN115393452A - Point cloud geometric compression method based on asymmetric self-encoder structure

Info

Publication number: CN115393452A
Application number: CN202210902132.8A
Authority: CN
Inventors: 方志军; 庄乐辉; 田瑾; 姜晓燕; 谭清宇
Original assignee: Shanghai University of Engineering Science
Current assignee: Shanghai University of Engineering Science
Priority date: 2022-07-29
Filing date: 2022-07-29
Publication date: 2022-11-25

Abstract

The invention discloses a point cloud geometric compression method based on an asymmetric self-encoder structure, and belongs to the technical field of point cloud compression. The scheme comprises the following steps: firstly, preprocessing a point cloud data set acquired in advance to obtain training data; constructing an asymmetric point cloud geometric compression network model, wherein the asymmetric point cloud geometric compression network comprises an asymmetric encoder network and a decoder network; constructing a multi-scale weighted distortion loss function, and training the asymmetric point cloud geometric compression network; and finally, inputting the point cloud data into the trained asymmetric point cloud geometric compression network to realize point cloud compression. The invention realizes better compression performance of 3D point cloud data, and compared with an MEPG method, the rate distortion performance of the invention is far superior to that of G-PCC and V-PCC of MPEG. Meanwhile, under the similar bit rate, the point cloud visualization effect reconstructed by the method is better than that of the MPEG method.

Description

Point cloud geometric compression method based on asymmetric self-encoder structure

Technical Field

The invention relates to the technical field of point cloud compression, in particular to a point cloud geometric compression method based on an asymmetric self-encoder structure.

Background

Point clouds are widely used in some emerging industries (e.g. 3D modeling, AR/VR, immersive communication) because of their high resolution and high fidelity. The massive amount of point cloud data promotes the need for an efficient point cloud compression method.

The Moving Picture Experts Group (MPEG) proposed two well-known point cloud compression methods, respectively a video-based point cloud compression method (V-PCC) for dynamic point clouds and a geometry-based point cloud compression method (G-PCC) for static point clouds. The V-PCC projects the three-dimensional point cloud into a two-dimensional space and then encodes it using an image/video encoder. Whereas G-PCC directly encodes three-dimensional information using a three-dimensional model, such as an octree or triangular surface. With the rise of deep learning, the learner starts exploring the correlation of voxels in 3D space using 3D convolution. Deep Neural Networks (DNNs) are used in these learning-based approaches, most of them utilize a Variational Automatic Encoder (VAE) architecture, enabling better compression by extracting compact latent feature representations.

Although learning-based point cloud compression algorithms have achieved excellent rate-distortion performance, there are still some problems to be solved. First, the symmetric self-encoder structure is commonly used in the existing algorithm, and the optimal performance may not be achieved in the aspect of rate distortion optimization. Existing algorithms employ structurally identical encoders and decoders, however, decoders are in fact relatively more important. Since the encoder affects two losses of rate-distortion optimization, while the decoder only affects the distortion loss.

The edge area of the point cloud is usually difficult to recover, and how to reconstruct and obtain the point cloud with higher quality is an urgent problem to be solved.

In the aspect of distortion calculation, the current multi-scale loss simply sums the point cloud distortion losses under all scales, and the influence of the distortion losses of different scales on the reconstructed point cloud is ignored.

Disclosure of Invention

Based on the problems, the invention provides a point cloud geometric compression method based on an asymmetric self-encoder structure, which realizes better rate distortion performance and higher point cloud reconstruction quality.

The specific scheme comprises the following steps:

(1) Preprocessing a point cloud data set acquired in advance to obtain training data;

(2) Constructing an asymmetric point cloud geometric compression network model, wherein the asymmetric point cloud geometric compression network comprises an asymmetric encoder network and a decoder network;

(3) Constructing a multi-scale weighted distortion loss function, and training the asymmetric point cloud geometric compression network;

(4) And inputting the point cloud data into the trained asymmetric point cloud geometric compression network to realize point cloud compression.

Further, the preprocessing the pre-acquired point cloud data set includes:

randomly sampling the point cloud data set to obtain point cloud data with random number;

randomly rotating the point cloud data to increase data diversity;

and obtaining the training data meeting the requirements through coordinate quantization.

Further, the constructing of the asymmetric point cloud geometric compression network model comprises:

compared with the decoder network, the encoder network which is more complex is designed, so that better rate distortion performance is realized;

the encoder network comprises a plurality of groups of down-sampling modules and attention modules, and down-sampling point cloud geometric information and attribute information are obtained through the encoder network and are respectively compressed to realize point cloud encoding;

the decoder network comprises a plurality of groups of up-sampling modules and wide receptive field modules, compressed files are input into the decoder network, point cloud number recovery is achieved through up-sampling, and reconstructed point clouds with higher quality are obtained through the wide receptive field modules to obtain decoding data.

Further, the downsampling module is realized by a convolution module with the step length of 2;

the attention module comprises an attention branch and a parallel residual error branch, the attention branch extracts features through the residual error module, and the sigmoid nonlinear activation calculation is carried out to obtain an attention mask; the parallel residual error branch extracts a point cloud characteristic diagram through three parallel residual error networks; and multiplying the point cloud feature map and the attention mask element by element to obtain the down-sampling point cloud geometric information and attribute information.

Further, the upsampling module is implemented by a transposed convolution module with a step size of 2;

the wide receptive field module consists of a wide receptive field network and a residual module; the wide receptive field module extracts characteristic information through a wide receptive field; and inputting the characteristic information into the residual error module to obtain a high-quality reconstructed point cloud.

Further, the compressing the geometry information and the attribute information of the down-sampling point cloud respectively includes:

lossless compression is carried out on the point cloud geometric information through an octree encoder, so that the accuracy of the geometric information is ensured;

and carrying out lossy compression on the point cloud attribute information through an arithmetic coder, quantizing the point cloud attribute information, and then improving the conditional probability estimation of the attribute information through super-prior.

Further, the multi-scale weighted distortion loss function is:

according to each scale, the reconstructed point cloud is judged according to the voxel occupation condition through two-class cross entropy loss, and the binary cross entropy loss expression is as follows:

wherein x _i Is the true label of the current voxel, p _i The predicted probability that the voxel is occupied is N, the number of the points in the generated point cloud is N, and k is the serial number of a decoding layer;

constructing multi-scale weighted distortion loss according to the binary cross entropy loss, wherein the expression is as follows:

wherein D is _k For binary cross-entropy loss at different scales, δ _k Are the corresponding distortion factor.

The invention also provides a point cloud geometric compression system based on the asymmetric self-encoder structure, which comprises the following components:

an acquisition module: the system comprises a point cloud data acquisition unit, a data processing unit and a data processing unit, wherein the point cloud data acquisition unit is used for acquiring a point cloud data set;

a modeling module: the method comprises the steps of constructing an asymmetric point cloud geometric compression network model, wherein the asymmetric point cloud geometric compression network comprises an asymmetric encoder network and a decoder network;

a training module: the system is used for constructing a multi-scale weighted distortion loss function and training the asymmetric point cloud geometric compression network;

a verification module: and the system is used for inputting the point cloud data into the trained asymmetric point cloud geometric compression network to realize point cloud compression.

The invention also provides a device comprising a memory, a processor and a computer program stored in the memory and executable on the memory, wherein the processor implements a point cloud geometric compression method based on an asymmetric self-encoder structure when executing the computer program.

The present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to execute a method of geometrical compression of a point cloud based on an asymmetric self-encoder structure.

The invention has the beneficial effects that:

the invention realizes better compression performance of 3D point cloud data, and compared with an MEPG method, the rate distortion performance of the invention is far superior to that of G-PCC and V-PCC of MPEG. Meanwhile, under the similar bit rate, the point cloud visualization effect reconstructed by the method is better than that of the MPEG method.

Drawings

Fig. 1 is a schematic flow chart of a point cloud geometric compression method based on an asymmetric self-encoder structure according to an embodiment of the present invention.

FIG. 2 is a schematic structural diagram of a point cloud geometric compression method based on an asymmetric self-encoder structure according to an embodiment of the present invention.

FIG. 3 is a schematic structural diagram of an attention module according to an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a wide receptive field module according to an embodiment of the invention.

FIG. 5 shows the results of simulation of the V-PCC, G-PCC, and PCGCv2 methods by rate-distortion curves according to the embodiments of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments.

As shown in fig. 1, an embodiment of the present invention provides a point cloud geometric compression method based on an asymmetric self-encoder structure, including the following steps:

s101, preprocessing a point cloud data set acquired in advance to obtain training data;

in this embodiment, a 3D shape data set sharenet is used, and random sampling is performed on the shape data set sharenet to obtain random number point cloud data. The data diversity is increased through random rotation, and the dimension of each coordinate of the point cloud is quantized to 9-bit precision through quantization, so that a data set meeting the requirements is obtained.

S102, constructing an asymmetric point cloud geometric compression network model, wherein the asymmetric point cloud geometric compression network comprises an asymmetric encoder network and a decoder network;

the asymmetric point cloud geometric compression network comprises an asymmetric encoder network and a decoder network, and a more complex encoder network is designed relative to the decoder network, so that better rate distortion performance is realized.

The encoder network comprises a plurality of groups of down-sampling modules and an attention module, the encoder network outputs more compact point clouds through continuous down-sampling, and the attention module focuses on the edge area; and obtaining the geometrical information and the attribute information of the down-sampling point cloud through a coder network, and respectively compressing the geometrical information and the attribute information to realize point cloud coding.

Wherein the down-sampling module is realized by a convolution module with step size of 2, and the attention module comprises an attention branch and a parallel residual error branch.

As shown in fig. 3, the attention branch extracts features through a residual module, and sigmoid nonlinear activation calculation obtains an attention mask; the parallel residual error branch extracts a point cloud characteristic diagram through three parallel residual error networks; and multiplying the point cloud feature map and the attention mask element by element to obtain down-sampling point cloud geometric information and attribute information.

And lossless compression is carried out on the point cloud geometric information through an octree encoder, so that the accuracy of the geometric information is ensured.

And performing lossy compression on the point cloud attribute information through an arithmetic coder, quantizing the attribute information, and improving the conditional probability estimation of the attribute information through super-prior.

In the training process, random noise is added to replace the quantization process, so that the back propagation of the model is realized.

The decoder network comprises a plurality of groups of up-sampling modules and wide reception field modules, compressed files are input into the decoder network, the number of point clouds is recovered through up-sampling, reconstructed point clouds with higher quality are obtained through the wide reception field modules, and decoding data are obtained.

Wherein the upsampling module is realized by transposition convolution with the step length of 2; the wide receptive field module consists of a wide receptive field network and a residual module; the wide receptive field module extracts characteristic information through a wide receptive field; and inputting the residual data into the residual error module to obtain a high-quality reconstructed point cloud.

As shown in fig. 4, the wide field module is composed of a wide field network and a residual module. The wide receptive field module extracts richer characteristic information through a wider receptive field. Then, a residual module is used for obtaining deeper features.

S103, constructing a multi-scale weighted distortion loss function, and training the asymmetric point cloud geometric compression network;

it should be noted that the point cloud reconstruction task is converted into the binary classification problem of the voxels, and the voxel occupancy condition is judged by calculating the current voxel occupancy probability.

Training is combined with multi-scale weighted distortion loss, a rate-distortion optimization objective function is constructed to train the asymmetric point cloud geometric compression network, and the method comprises the following steps:

reconstructing point cloud according to each scale, and judging the voxel occupation condition through binary cross entropy loss, wherein the binary cross entropy loss expression is as follows:

wherein x _i Is the true label of the current voxel, p _i The predicted probability of the voxel being occupied, N is the number of points in the generated point cloud, and k is the decoding layer number. Constructing multi-scale weighted distortion loss according to the binary cross entropy loss, wherein the expression is as follows:

wherein D is _k For binary cross-entropy losses, delta, at different scales _k Are the corresponding distortion factor.

And training the asymmetric point cloud geometric compression network according to the multi-scale weighted distortion loss.

It should be noted that, the training process is based on the above asymmetric point cloud geometric compression network, and the network training is completed through a large amount of point cloud data. And constructing a target optimization function according to the multi-scale weighted distortion loss, determining the voxel occupation condition by the multi-scale weighted distortion loss in a classification mode, improving the point cloud reconstruction quality by multi-scale loss calculation, measuring the importance degree of distortion loss under different scales through loss coefficients, and finally finishing the high-quality point cloud reconstruction result.

And S104, inputting the point cloud data into the trained asymmetric point cloud geometric compression network to realize point cloud compression.

Finally, the compression results of the present invention were tested, including:

and comparing with a compression scheme V-PCC for processing dynamic point clouds, a compression scheme G-PCC for processing static point clouds and a compression method PCGCv2 based on learning, which are proposed by MPEG, through a rate-distortion curve. The GPCC comprises two model representation methods: the octree model and the trisoup model are denoted as GPCC (octree) and GPCC (trisoup). In fig. 5, the abscissa indicates the average number of Bits (Bits per Point) occupied by each Point, the ordinate indicates the Point-to-Point (D1) PSNR and the Point-to-face (D2) PSNR as distortion matrices, and a rate-distortion curve is plotted as an objective evaluation criterion, and the result is shown in fig. 5. It can be seen that the present invention achieves the optimal rate distortion performance, and especially at high bit rates, the performance advantages achieved by the present invention are more significant.

The embodiment of the invention also provides a point cloud geometric compression system based on the asymmetric self-encoder structure, which comprises the following steps:

The embodiment of the invention also provides equipment which comprises a memory, a processor and a computer program which is stored in the memory and can be executed on the memory, wherein the processor realizes the point cloud geometric compression method based on the asymmetric self-encoder structure when executing the computer program.

Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed in a computer, the computer program is used to make the computer execute a point cloud geometric compression method based on an asymmetric self-encoder structure.

Those skilled in the art will recognize that the functions described in the embodiments of this specification may be implemented in hardware, software, firmware, or any combination thereof, in one or more of the examples described above. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

In summary, for the point cloud compression technology for directly processing point cloud coordinates, the number of point cloud inputs is fixed, and high-precision point cloud reconstruction is difficult to achieve, and for the point cloud compression technology based on a voxel domain, a symmetric self-encoder structure is widely used, an encoder and a decoder network are equally constructed, and the optimal rate distortion performance cannot be obtained, details of the point cloud in the point cloud reconstruction are difficult to recover, and the point cloud reconstruction quality is not high. The invention provides a point cloud geometric compression method based on an asymmetric self-coding structure, and better rate distortion optimization is realized by designing a deeper coding layer. The encoder is added with an attention module to improve the detail recovery effect of the point cloud. The encoder uses a wide receptive field network, and the overall reconstruction quality of the reconstructed point cloud is improved. And the multi-scale weighted distortion loss is designed as a target optimization function, so that the performance of the compression model is improved.

While specific embodiments of the invention have been described above in detail, the embodiments presented and described in the flow charts of the invention are provided by way of example for the purpose of providing a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications and variations may be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents.

Claims

1. A point cloud geometric compression method based on an asymmetric self-encoder structure is characterized by comprising the following steps:

2. The method of geometric compression of point clouds based on asymmetric self-encoder structures as claimed in claim 1, wherein the pre-processing of the pre-acquired point cloud data set comprises:

randomly rotating the point cloud data to increase data diversity;

3. The method of geometric compression of point clouds based on asymmetric self-encoder structure as claimed in claim 1, characterized in that: the method for constructing the asymmetric point cloud geometric compression network model comprises the following steps:

compared with a decoder network, a more complex encoder network is designed, which is beneficial to realizing better rate distortion performance;

the encoder network comprises a plurality of groups of down-sampling modules and attention modules, and performs down-sampling operation on the input point cloud through the encoder network to obtain down-sampling point cloud geometric information and attribute information, and respectively compresses the down-sampling point cloud geometric information and attribute information to realize point cloud encoding;

the decoder network comprises a plurality of groups of up-sampling modules and wide reception field modules, compressed files are input into the decoder network, the point cloud number recovery is realized through the up-sampling modules, and reconstructed point clouds with higher quality are obtained through the wide reception field modules to obtain decoding data.

4. The method of geometric compression of point clouds based on asymmetric self-encoder structure as claimed in claim 3, characterized in that:

the down-sampling module is realized by a convolution module with the step length of 2;

the attention module comprises an attention branch and a parallel residual error branch, the attention branch extracts features through a residual error module, and a sigmoid nonlinear activation calculation is carried out to obtain an attention mask; the parallel residual error branch extracts a point cloud characteristic diagram through three parallel residual error networks; and multiplying the point cloud feature map and the attention mask element by element to obtain the down-sampling point cloud geometric information and attribute information.

5. The method of claim 3, wherein the point cloud geometric compression based on the asymmetric self-encoder structure comprises:

the up-sampling module is realized by a transposition convolution module with the step length of 2;

the wide receptive field module consists of a wide receptive field network and a residual module; the wide receptive field module extracts characteristic information through a wide receptive field; and inputting the characteristic information into the residual error module to obtain high-quality reconstructed point cloud.

6. The method of claim 3, wherein the point cloud geometric compression based on the asymmetric self-encoder structure comprises: compressing the geometric information and the attribute information of the down-sampling point cloud respectively comprises the following steps:

7. The method of geometric compression of point clouds based on asymmetric self-encoder structures as claimed in claim 1, wherein: the multi-scale weighted distortion loss function is:

according to each scale, the reconstruction point cloud is judged according to two-class cross entropy loss, and the binary cross entropy loss expression is as follows:

8. A point cloud geometric compression system based on an asymmetric self-encoder structure, comprising:

9. An apparatus comprising a memory, a processor, and a computer program stored in and executable on the memory, wherein: the processor, when executing a computer program, implements the method of point cloud geometric compression based on asymmetric self-encoder structures as claimed in any one of claims 1-7.

10. A computer-readable storage medium having stored thereon a computer program, characterized in that: when the computer program is executed in a computer, the computer is caused to execute the point cloud geometric compression method based on the asymmetric self-encoder structure as claimed in any one of claims 1 to 7.