CN114025146B

CN114025146B - Dynamic point cloud geometric compression method based on scene flow network and time entropy model

Info

Publication number: CN114025146B
Application number: CN202111285773.5A
Authority: CN
Inventors: 叶振虎; 杨柏林; 江照意; 邹文钦; 丁璐赟
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2021-11-02
Filing date: 2021-11-02
Publication date: 2023-11-17
Anticipated expiration: 2041-11-02
Also published as: CN114025146A

Abstract

The invention discloses a dynamic point cloud geometric compression method based on a scene flow network and a time entropy model. The invention mainly aims at the problem of geometrical compression of dynamic point cloud, utilizes a scene flow network to estimate the motion vector of the point cloud of the previous frame, thereby utilizing time redundancy, regards the motion vector as the attribute of the point cloud, utilizes an attribute compression mode in MPEG (moving Picture experts group) to encode so as to utilize space redundancy, and then introduces a time entropy model network to encode the residual errors of a predicted frame and a current frame in a hidden space, so as to realize the geometrical compression of the dynamic point cloud. The method solves the optimized compression of time sequence mass dynamic point cloud data, and provides technical support for wider application and popularization of the three-dimensional dynamic point cloud.

Description

Dynamic point cloud geometric compression method based on scene flow network and time entropy model

Technical Field

The invention relates to a dynamic point cloud geometric compression method, belongs to the technical field of artificial intelligence and GIS information, and particularly relates to a dynamic point cloud geometric compression method based on a scene flow network and a time entropy model.

Background

A Point Cloud (Point Cloud) is a collection of three-dimensional (or more) geometric model surface sampling points, each Point containing geometric information (x, y, z) and corresponding attribute information, such as information of color (r, g, b), reflectivity, transparency, and the like. The dynamic point cloud is a time continuous point cloud sequence, unlike grid data, the point cloud does not contain topological information in space, no corresponding relation exists in time, and the point cloud contains much noise data, so that how to effectively remove space and time redundancy is a difficult thing.

On the other hand, with the development of sensing equipment, the acquisition of point clouds is easier, and the point clouds have huge application potential in various fields such as immersive 3D remote, VR, motion playback of free view angles, automatic driving and the like. Meanwhile, the data volume of the high-resolution dynamic point cloud is larger and larger, and the storage capacity and the transmission capacity of a large amount of dynamic point cloud data to hardware equipment are high. Therefore, the method for researching the compressed storage of the dynamic point cloud data has very important practical significance.

According to the reference data which can be referred to, more scientific researchers at home and abroad in recent years are engaged in research work related to dynamic point cloud compression, a series of compression schemes are proposed, wherein the compression schemes comprise an XOR (difference of octree structures of coding adjacent frames), a dynamic point cloud compression method based on a graph, and compression methods based on ICP (nearest iteration point) and intra-frame coding all achieve compression effects of different degrees, but the compression rates of the modes are lower.

Motion estimation and residual compression are key factors for dynamic point cloud geometric compression, but the motion estimation modes adopted before, such as motion estimation based on a graph, ICP and other modes, have lower estimation accuracy, and the residual compression methods before, such as an XOR method, have larger coding quantity based on block intra-frame coding and other modes.

Therefore, the compression method capable of effectively removing the geometrical redundancy of the dynamic point cloud is designed and realized, and has stronger practical significance and application value.

Disclosure of Invention

The invention mainly solves the problems existing in the prior art and provides a dynamic point cloud geometric compression method based on a scene flow network and a time entropy model.

The invention mainly aims at the problem of geometrical compression of dynamic point cloud, utilizes a scene flow network to estimate the motion vector of the point cloud of the previous frame, thereby utilizing time redundancy, regards the motion vector as the attribute of the point cloud, utilizes an attribute compression mode in MPEG (moving Picture experts group) to encode so as to utilize space redundancy, and then introduces a time entropy model network to encode the residual errors of a predicted frame and a current frame in a hidden space, so as to realize the geometrical compression of the dynamic point cloud. The method solves the optimized compression of time sequence mass dynamic point cloud data, and provides technical support for wider application and popularization of the three-dimensional dynamic point cloud.

The invention is solved by the following technical scheme:

step one: a scene flow network-based motion estimation step for estimating a motion vector of a previous frame point cloud relative to a current frame point cloud;

step two: a motion vector coding and motion compensation step, which is used for coding the motion vector estimated in the previous step, and performing motion compensation on the point cloud of the previous frame by utilizing the decoded motion vector to obtain a predicted point cloud;

step three: and a residual compression step, which is used for encoding difference information of the predicted point cloud and the original point cloud.

The invention has the following beneficial effects: according to the method, the scene flow network is introduced, so that the motion vector of the point cloud of the previous frame can be estimated rapidly and accurately, and the time redundancy is effectively removed. The motion vector is regarded as a point cloud attribute and is encoded by utilizing an attribute compression mode in MPEG, and the method can efficiently encode the motion vector and effectively utilize space redundancy. By introducing the time entropy model network, the residual error coding quantity can be greatly reduced. Finally, the whole framework uses a sparse convolution network, so that the memory can be greatly reduced and the running speed can be improved.

Drawings

FIG. 1 is an overall framework of dynamic point cloud geometric compression based on a scene flow network and a temporal entropy model provided by an embodiment of the invention.

Detailed Description

The technical scheme of the invention is further specifically described below through examples and with reference to the accompanying drawings.

Examples:

the embodiment provides a dynamic point cloud geometric compression method based on a scene flow network and a time entropy model, which specifically comprises the following steps as shown in fig. 1:

step one: scene flow estimation

Firstly, scaling and quantizing the decoded previous frame point cloud and the current frame point cloud, randomly sampling a certain point number from the previous frame point cloud and the current frame point cloud, and inputting the point number into a scene flow network for processing. The sampled points are subjected to sparse convolution with a step length of 2 in several layers to extract multi-scale characteristics of the point cloud, then the scene flow information is estimated in a bottom-up mode, and the scene flow information of the current layer is estimated by using a scene flow estimation module in each layer.

The scene flow estimation module mainly utilizes the cost body sub-module and the scene flow predictor sub-module to estimate scene flow information. The cost-body submodule integrates point-to-point similarities principally in a block-to-block fashion. The scene flow predictor submodule mainly utilizes the characteristics of the point cloud of the last frame and the characteristics of the point cloud of the current frame, and the scene flow information which is up-sampled in the last layer, and the cost body information is used for predicting the scene flow information of the current layer. After the motion vectors of the sampling points are obtained, the motion vectors of all points of the previous frame are obtained through interpolation.

Step two: motion vector compression and motion compensation

The decoded motion vector is compressed and decompressed by using an attribute compression mode in MPEG to obtain a decompressed motion vector, and then the decompressed motion vector is used for performing motion compensation on the point cloud of the last frame of decoding to obtain a predicted point cloud.

Step three: residual compression

Encoding the difference value between the predicted point cloud and the original point cloud by using a time entropy model network: firstly, mapping the predicted point cloud and the current frame point cloud to hidden variables Y1 and Y in a hidden space by using an encoder. The hidden variable Y is composed of position information C _Y Corresponding characteristic information F _Y And (3) representing. Taking the difference between Y1 and Y in the hidden space to obtain a difference Y in the hidden space _res . Lossless compression of the Y position information using octree compression, and then considering how to encode the difference Y _res . For the difference Y _res Can be defined byThe position information in Y1 and Y is obtained by differencing, for the difference Y _res Is quantized first and then lossless compressed using arithmetic coding. The probability distribution of the characteristic information corresponding to the difference value is assumed to meet the Gaussian mixture distribution, and the probability distribution of each component is approximated by a Gaussian distribution (the mean value is mu, and the variance is sigma), so that only one network is designed to obtain the parameters of the Gaussian distribution.

The information after Y1 and Y are spliced is subjected to 2-layer sparse convolution with the step length of 2 to obtain hidden variable Z, and characteristic information F of Z is obtained _Z Quantization is performed first, then lossless compression is performed by using arithmetic coding, and F is estimated by using a full decomposition entropy model _Z Probability distribution information of (2). The compressed characteristic information F _Z After arithmetic decoding, the decoded hidden variable is obtained Z1 is obtained through 2 layers of sparse convolution with step length of 2. Hidden variable Y1 of the prediction point cloud is subjected to 3-layer sparse convolution to obtain Y2, and then the hidden variable spliced by Y2 and Z1 is subjected to 3-layer sparse convolution to estimate probability distribution of characteristic information of a difference value.

Performing octree decoding on the compressed Y position information to obtain hidden variables of the current point cloudWill beDifference between the position information of Y1 and Y1, resulting in a difference +.>Is obtained by arithmetically decoding the characteristic information of the compressed difference value>Is a difference of the position information and the characteristic information of the decoding>Decoded difference +.>Adding hidden variable Y1 of the predicted point cloud to obtain hidden variable ++of the current point cloud>Hidden variable->The decoded point cloud is obtained through a decoder.

Examples:

the data set used for testing in this embodiment is a dynamic point cloud sequence holder in MPEG; according to the overall flowchart steps of the invention of fig. 1, a scene flow network and a time entropy model network are trained first. Here, part of the point cloud data in the AMASS is selected for training. For the input current frame point cloud and the decoded last frame point cloud, firstly reducing the current frame point cloud and the decoded last frame point cloud by 2 times, quantizing, randomly sampling 100000 points, and inputting the points into a scene flow network to estimate the motion vector of the previous frame point cloud. For the obtained motion vectors, motion vectors of all points in the previous frame are obtained by interpolation, and the motion vectors are enlarged by 2 times.

And then, the motion vector is regarded as attribute information of the point cloud, and the motion vector is encoded by utilizing a lift transformation mode in MPEG to obtain a bit stream. And then decoding the bit stream in a lift transformation mode to obtain a decoded motion vector, and performing motion compensation on the decoded last frame point cloud by using the motion vector to obtain a predicted point cloud.

And finally, inputting the current frame point cloud and the predicted frame point cloud into a time entropy model network, obtaining the final decoded point cloud, and placing the point cloud into a decoded frame buffer.

The implementation method and other methods are respectively tested on several indexes of bpp, D1 and D2, and the obtained results are gathered into the following table:

where bpp represents how many bits of coding are needed to average each vertex, the smaller the value, the better the D1 represents the point-to-point distortion measure, the larger the value, the better the D2 represents the point-to-plane distortion measure, and the larger the value the better. As can be seen from the results, the best results are obtained under both the distortion metrics D1 and D2 at the minimum bpp in this embodiment, so that the compression rate can be effectively improved compared with the previous method according to the present invention.

Claims

1. The dynamic point cloud geometric compression method based on the scene flow network and the time entropy model is characterized by comprising the following steps of:

step three: a residual compression step, which is used for encoding difference information of the predicted point cloud and the original point cloud;

the scene flow estimation module in the scene flow network mainly utilizes the cost body submodule and the scene flow predictor submodule to estimate scene flow information, wherein the cost body submodule integrates the similarity between points in a block-to-block mode; the scene flow predictor submodule mainly utilizes the characteristics of the point cloud of the last frame and the characteristics of the point cloud of the current frame, and the scene flow information sampled in the last layer is used for predicting the scene flow information of the current layer by the cost body information;

the method comprises the steps of utilizing a time entropy model network to encode a difference value between a predicted point cloud and an original point cloud, and specifically comprises the following steps:

mapping the predicted point cloud and the current frame point cloud to hidden variables Y1 and Y in a hidden space by using an encoder;

taking the difference between Y1 and Y in the hidden space to obtain a difference Y in the hidden space _res ；

Performing lossless compression on the Y position information by using an octree compression method;

for the difference Y _res After the characteristic information of the (B) is quantized, lossless compression is carried out by utilizing arithmetic coding;

the information after Y1 and Y are spliced is subjected to 2-layer sparse convolution with the step length of 2 to obtain hidden variable Z, and characteristic information F of Z is obtained _Z Quantization is performed first, then lossless compression is performed by using arithmetic coding, and F is estimated by using a full decomposition entropy model _Z Probability distribution information of (2);

compressed characteristic information F _Z After arithmetic decoding, the decoded hidden variable is obtained Obtaining Z1 through 2 layers of sparse convolution with the step length of 2;

hidden variable Y1 of the prediction point cloud is subjected to 3-layer sparse convolution to obtain Y2, and then the hidden variable spliced by Y2 and Z1 is subjected to 3-layer sparse convolution to estimate probability distribution of characteristic information of a difference value;

performing octree decoding on the compressed Y position information to obtain hidden variables of the current point cloudWill->Difference between the position information of Y1 and Y1, resulting in a difference +.>Is a part of the position information of the mobile terminal; the difference Y to be compressed _res Performing arithmetic decoding on the characteristic information of (2) to obtain difference +.>Is a difference of the position information and the characteristic information of the decoding>

Decoded difference valueAdding hidden variable Y1 of the predicted point cloud to obtain hidden variable ++of the current point cloud>Hidden variable->The decoded point cloud is obtained through a decoder.

2. The scene flow network and time entropy model based dynamic point cloud geometric compression method according to claim 1, wherein the method comprises the following steps: the decoded motion vector is compressed and decompressed by using an attribute compression mode in MPEG to obtain a decompressed motion vector, and then the decompressed motion vector is used for performing motion compensation on the point cloud of the previous frame to be decoded to obtain a predicted point cloud.

3. The scene flow network and time entropy model based dynamic point cloud geometric compression method according to claim 1, wherein the method comprises the following steps: the difference Y _res Is obtained by subtracting the position information in Y1 and Y.