CN114092579A

CN114092579A - Point cloud compression method based on implicit neural network

Info

Publication number: CN114092579A
Application number: CN202111357338.9A
Authority: CN
Inventors: 邹文钦; 杨柏林; 江照意; 叶振虎; 丁璐赟
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2021-11-16
Filing date: 2021-11-16
Publication date: 2022-02-25
Anticipated expiration: 2041-11-16
Also published as: CN114092579B

Abstract

The invention discloses a point cloud compression method based on an implicit neural network. Firstly, a certain class shape in a data set is given and divided into a training set and a testing set; preprocessing a mesh model in an original data set to obtain an SDF value; secondly, designing an integral network framework based on auto-decoder and fused with an entropy model, inputting observation information and hidden variables quantized by the entropy model, and outputting the SDF estimated value of the query point; and then carrying out model training and inference to obtain a hidden variable representing a single shape, finally compressing the hidden variable into a binary character string, further increasing the compression rate, and transmitting the compressed character string to a decoding end. The invention does not need to process the 3d shape through complex 3d convolution, and the implicit network is expressed through simple MLP, so the structure is simpler.

Description

Point cloud compression method based on implicit neural network

Technical Field

The invention belongs to the field of three-dimensional reconstruction and point cloud compression, and particularly relates to a point cloud compression method based on an implicit neural network.

Background

In recent years, the field of acquisition and application of point cloud data becomes more and more diverse, and the fields such as automatic driving, VR, AR and the like have the shadow of the point cloud data. Compared with Mesh or Voxel, the point cloud has no complex topological structure and can be directly acquired through a radar sensor, but the point cloud usually has massive data, and a large amount of memory resources and network bandwidth are consumed for direct storage and transmission, so that an efficient point cloud compression scheme is necessary.

The traditional mainstream point cloud compression method is based on two basic compression frames proposed by MPEG, one is VPCC, and the other is GPCC. VPCC uses video compression technology to compress dynamic point cloud sequences in real time, while GPCC compresses some geometric attributes of point clouds, such as color, normal, etc., which are usually used to compress static point clouds. Compared with the traditional methods, the existing deep learning methods are mostly based on a VAE framework, firstly point cloud down-sampling is carried out on an original point cloud to an implicit space through a multilayer convolutional layer, then different entropy models are combined, probability distribution of the implicit space is obtained in a training process, information behind an encoder is quantized and entropy-coded to obtain a bit stream, and a decoding end carries out an up-sampling process through the bit stream and the decoder to finally reconstruct a new point cloud. The point cloud compression is carried out by utilizing the deep learning method, and the compression rate and the precision are further improved compared with the traditional method. However, the existing deep learning method has large network parameters and can only perform compression and reconstruction under a fixed resolution, resulting in poor expandability.

From 2019, the implicit function neural network draws wide attention in the 3d shape representation field, the main idea of the implicit function is that observation information of an original shape and the position of a query point are given, the query point in a space is judged to be inside and outside the shape, the network formed by the multilayer perceptron fits an implicit surface through learning, and finally a new shape can be reconstructed through a marching cube algorithm.

The implicit network is defined on a continuous domain of the whole input, is more efficient than discrete representation, and can process the input of various topological structures, such as voxels, mesh and point clouds. Implicit networks are not only spatially continuous, but may in theory not limit resolution output. Compared with the existing deep learning method, the implicit network has a simpler overall structure, does not need multi-layer up-sampling operation, and has good expandability and generalization capability. Therefore, the application of the implicit network to the field of point cloud compression is considered, and the method has very important significance for the further development of point cloud compression in the future.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a point cloud compression method based on an implicit neural network.

The invention comprises the following steps:

step 1: given a class of shapes in the Shapelet dataset, the classification is divided into two parts, a training set and a test set.

Step 2: and preprocessing the mesh model in the original data set to obtain an SDF value.

And step 3: designing an integral framework based on auto-decoder and fused with an entropy model, inputting observation information and hidden variables quantized by the entropy model, outputting SDF estimated values of query points, and constructing L through the SDF estimated values₁Loss, and designing a regularization term of the hidden variable to increase generalization capability, and finally the loss of the whole network is as follows:

wherein f is_θIs an implicit network of training set sharing parameters, s_jIs the true SDF value of the input point, L is L₁The function of the loss is a function of,

after being processed by entropy modelLatent variable, x_jAre the sampling points.

Is the loss of compression of the entropy model,

is a probability distribution function of hidden variables learned by the network, wherein sigma and lambda are hyper-parameters.

And 4, step 4: taking disturbance points in the preprocessed training set as input x of an implicit network, simultaneously randomly initializing hidden variables with fixed dimensionality, splicing the hidden variables with the input x after passing through an entropy model, taking the spliced hidden variables as input of the implicit network, training, and learning common characteristics of the input of the type.

And 5: and in the inference stage, the implicit network weight of the decoder part is fixed, the randomly initialized implicit variables are optimized through a small number of iteration processes to obtain the implicit variables finally representing a single shape, and the decoding end can obtain the reconstructed shape through the implicit variables and the network weight.

Step 6: and quantizing and arithmetically encoding the hidden variables in the testing stage, compressing the hidden variables into binary character strings, further increasing the compression rate, and transmitting the compressed character strings to a decoding end.

The invention has the beneficial effects that: the invention can realize considerable compression ratio on a specific type of data set, obtain hidden variable information representing original input through the hidden function network, and carry out quantization compression on hidden variables, the integral network does not need to process a 3d shape through complex 3d convolution, the hidden network is represented through simple MLP, and the structure is simpler.

Drawings

FIG. 1 is a block diagram of an auto-decoder network;

FIG. 2 is a diagram of the overall network framework of the present invention;

fig. 3 is a graph of different reconstruction loss and compression bit rate obtained by network testing of different hidden variable dimensions.

Detailed Description

The invention is further described below with reference to the accompanying drawings:

step 1: the shapes of a certain category in ShapeNet are divided into a training set and a test set, and the shapes of the same category are used for ensuring that the training set and the test set have similar shape characteristics.

Step 2: preprocessing a mesh model in an original data set, normalizing the mesh model to a unit sphere, sampling a fixed point number on the surface of the mesh model, adding disturbance to obtain a disturbance point at any position in space, calculating the distance and the sign between the disturbance point in the space and a sampling point on the surface of the mesh model, namely an SDF (software development framework) value, and storing the SDF value and the coordinate corresponding to the disturbance point in a new file.

And 3, designing an integral network framework, inputting observation information and a hidden variable quantized by an entropy model, and outputting the SDF estimated value of the query point. The auto-decoder principle is shown in figure 1, a decoder encoder part is removed from a general frame of the encoder-decoder, a hidden variable of an intermediate layer is directly used as the input of the decoder part, the parameter of the decoder and the value of the hidden variable are updated simultaneously during reverse propagation, and the redundancy of the encoder part is avoided during the inference stage, so that the output precision is higher.

Fusing an entropy model part on the basis of auto-decoder to realize further processing on hidden variables, wherein P is initial observation point information, and Z is a randomly initialized hidden variable with one dimension, as shown in FIG. 2; in the training process, firstly, the hidden variables are input into a full decomposition Entropy Model by a quantizer Q, the probability distribution of the hidden variables is predicted by the full decomposition Entropy Model, Y is the hidden variables,

is the quantized value, AE is the arithmetic encoder, AD is the arithmetic decoder,

is the hidden variable after entropy model processing, and D is the decoder.

The invention uses multi-layer full-connection netTraining the network parameter θ by the network, all the individual shapes S in the training set_iThe common characteristics are learned on the same implicit network. In the decoder part in the network framework of the traditional method, multilayer upsampling is carried out on intermediate hidden variables by using a 3d convolution mode and the like, and an implicit network learns and fits an implicit curved surface representing a shape by using the hidden variables in the implicit network, so that the needed network parameters are less, and the expandability is better.

To study

For a data set containing N shapes, for each particular shape S_iK sampling points and SDF values of the sampling points are prepared, and corresponding hidden variables z_iOutputting SDF value through computing network during training

And the true SDF value s of the input point_jL between₁Loss while maximizing the combined logarithm posteriori of all training shapes, i.e. adding an L to the hidden vector₂Regularization term, plus the loss of the entropy model:

is a latent variable, x, after being processed by an entropy model_jAre the sampling points.

Is the loss of compression of the entropy model,

is network learningThe probability distribution functions, sigma and lambda, of the hidden variables are hyper-parameters.

And 4, step 4: the method comprises the steps of taking disturbance points in a preprocessed training set as input x of an implicit network, simultaneously randomly initializing a hidden variable with a fixed dimensionality, splicing the hidden variable after being processed by an entropy model with the x, taking the spliced hidden variable as input of the implicit network, and training.

And 5: in the inference phase, since the network parameters of the decoder are already obtained after training, the network weights of the decoder are fixed, and the values of the hidden variables z of each new shape are fine-tuned using a similar loss function:

step 6: after the hidden variables finally representing a single shape are obtained, the entropy model learns the probability distribution of the hidden space during training, the hidden variables are quantized and arithmetically encoded in combination with the probability distribution, the hidden variables are compressed into binary character strings and transmitted to a decoding end, and the decoding end decompresses the hidden variables through the hidden variables and network parameters.

Example (b):

the experimental results are shown in fig. 3, and specifically include:

step 1: the stool dataset in the Shapelet dataset was used for training and testing, the training and testing sets were preprocessed, an ADAM optimizer with a learning rate of 0.0005 was used in the experiment, the batch _ size was set to 12, and 60 epochs were trained.

Step 2: the final shape reconstruction effect is controlled by controlling the dimensionality of the initial hidden variables, the dimensionalities of the selected hidden variables are 8, 16, 32, 64, 128 and 256, and the initial hidden variables with different dimensionalities are respectively and independently trained by combining a decoder and an entropy model.

And step 3: in the testing stage, the data of the test set are respectively tested through the trained networks with different hidden variable dimensions to obtain different reconstruction losses and compression bit rates, as shown in fig. 3, the bpp (bit per point) of the invention is changed into 0.0898,0.1172,0.1563, 02383,0.3672 and 0.6055, and the reconstruction losses are changed into 0.0835,0.0382,0.0405,0.0316, 0.0250 and 0.0220.

Claims

1. A point cloud compression method based on an implicit neural network is characterized by comprising the following steps:

step 1: a certain class shape in a ShapeNet data set is given and divided into a training set and a testing set;

step 2: preprocessing a mesh model in an original data set to obtain an SDF value;

and step 3: designing an integral network framework based on auto-decoder and fused with an entropy model, inputting observation information and hidden variables quantized by the entropy model, and outputting SDF estimated values of query points;

constructing L from SDF values₁Loss, and designing a regularization term of the hidden variable to increase generalization capability, and finally the loss of the whole network is as follows:

is a latent variable, x, after being processed by an entropy model_jIs a sampling point;

is a probability distribution function of hidden variables learned by the network,

is the compression loss of the entropy model, and sigma and lambda are both hyper-parameters;

and 4, step 4: taking disturbance points in the preprocessed training set as input x of an implicit network, simultaneously randomly initializing hidden variables with fixed dimensionality, splicing the hidden variables with the input x after passing through an entropy model, taking the spliced hidden variables as input of the implicit network, training, and learning common characteristics of the input of the type;

and 5: in the inference stage, the implicit network weight of the decoder part is fixed, the randomly initialized implicit variables are optimized through a small number of iteration processes to obtain the implicit variables which finally represent a single shape, and the decoding end obtains the reconstructed shape through the implicit variables and the network weight;

2. The point cloud compression method based on the implicit neural network as claimed in claim 1, wherein: the auto-decoder removes an encoder part of a decoder on a universal frame of the encoder-decoder, directly uses a hidden variable of an intermediate layer as an input of the encoder part of the decoder, and simultaneously updates a parameter of the decoder and a value of the hidden variable during reverse propagation.

3. The point cloud compression method based on the implicit neural network as claimed in claim 1, wherein: in the whole network framework training process, the hidden variables are input into a fully decomposed Entropy Model Entorpy Model through a quantizer Q, and the probability distribution of the hidden variables is predicted through the fully decomposed Entropy Model Entorpy Model.

4. The point cloud compression method based on the implicit neural network as claimed in claim 1, wherein: training a network parameter theta by adopting a multi-layer fully-connected network, and training all single shapes S in a set_iThe common characteristics are learned on the same implicit network.