CN113284203A

CN113284203A - Point cloud compression and decompression method based on octree coding and voxel context

Info

Publication number: CN113284203A
Application number: CN202110520566.7A
Authority: CN
Inventors: 不公告发明人
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-05-04
Filing date: 2021-05-04
Publication date: 2021-08-20
Anticipated expiration: 2041-05-04
Also published as: CN113284203B

Abstract

The invention discloses a point cloud compression and decompression method based on octree coding and voxel context, which introduces local voxel context in a depth entropy model based on the advantages of the voxel and octree method to better compress octree structure data and can be applied to static and dynamic point cloud compression.

Description

Point cloud compression and decompression method based on octree coding and voxel context

Technical Field

The invention relates to the technical field of point cloud data compression and decompression, in particular to a point cloud compression and decompression method based on octree coding and voxel context.

Background

Currently, artificial intelligence has drastically changed the visual perception capabilities of robots, and machines based on these AI algorithms typically utilize a variety of different sensors to perceive and interact with the world. In particular, because they are capable of accurately capturing the 3D geometry of a scene, it has proven critical in many types of machine scenes (e.g., autopilots, indoor robots, robotic arms, drones, etc.) such as LiDAR and structured light scanners, which are capable of producing large amounts of data, a single Velodyne HDL-64 LiDAR sensor produces over 100,000 points per scan, and it can perform over 840 billions of scans per day. The large volume of raw sensor data presents a significant challenge to storage and real-time communication. Therefore, it is crucial to efficiently compress 3D point cloud data.

The invention patent application with publication number CN 112581552A and name of a self-adaptive blocking point cloud compression method and device based on voxels discloses a point cloud compression method based on deep learning. The method comprises the steps of firstly carrying out self-adaptive blocking on original point cloud data, then coding the point cloud blocks subjected to self-adaptive blocking, training a compression encoder by using an improved WBCE (Weighted Binary Cross Entropy) loss function, and finally coding the point cloud blocks subjected to self-adaptive blocking by using the trained compression encoder. However, this method does not fully utilize the spatial context information and the overall structure of the point cloud to perform data compression, so that redundant information cannot be fully compressed, and the compression effect is not ideal.

Therefore, how to provide a point cloud compression and decompression method with better performance is a problem that needs to be solved urgently by those skilled in the art.

Disclosure of Invention

In view of the above, the invention provides a point cloud compression and decompression method based on octree coding and voxel context, which is characterized in that sparse and disordered point clouds in a three-dimensional space are structured through an octree structure, the octree structure coding is compressed in combination with spatial context information, and reconstructed point cloud coordinates are optimized through the spatial context, so that the problems that the existing point cloud data compression method cannot fully utilize the spatial context information and the overall structure of the point clouds to perform data compression, cannot fully compress redundant information, and causes the compression effect to be not ideal and the like are effectively solved.

In order to achieve the purpose, the invention adopts the following technical scheme:

a point cloud compression and decompression method based on octree coding and voxel context comprises the following steps:

performing voxelization on the original point cloud data, establishing a corresponding octree structure, and representing child nodes of each non-leaf node in the octree structure by using node codes;

extracting binary voxel representation of a local area taking the current node as the center from the overall voxel representation of the depth corresponding to the current node, and generating local voxel representation corresponding to each node;

obtaining context feature vectors corresponding to the local voxel representations;

and connecting the context feature vector with a feature consisting of coordinate information of the node and depth information of the node in the octree structure, obtaining probability distribution corresponding to the node code through a pre-constructed prediction network, and performing lossless compression coding or decompression of the coding on the node code by using an arithmetic coding algorithm.

Further, the node encoding is an eight-bit binary encoding. In this node encoding, the eight-bit 01 symbol of each non-leaf node represents the occupancy state of its eight children.

Further, obtaining a context feature vector corresponding to the local voxel representation specifically includes:

when the original point cloud data is static point cloud, extracting the context feature vector corresponding to the local voxel representation by directly utilizing a three-dimensional convolution network;

when the original point cloud data is dynamic point cloud, the current node is positioned at the kth layer of the octree structure corresponding to the t-th frame, and the corresponding local voxel is represented and recorded as

Extraction using three-dimensional convolutional networks

And

and four local voxels represent, and context feature vectors are obtained.

The method provided by the invention also expands the compression of dynamic point cloud data, and further reduces the code rate by using the context information of the point cloud of the previous frame and the next frame.

Further, the point cloud compression and decompression method based on octree coding and voxel context further includes:

and predicting the offset of the actual point relative to the voxel center based on the local voxel representation corresponding to the leaf node, and generating a recovery point cloud with 3D coordinates.

This step produces more accurate 3D coordinates for each leaf node in both static and dynamic point clouds, thus enabling a decoder-side based coordinate optimization scheme based on the local voxel representation.

According to the technical scheme, compared with the prior art, the invention discloses and provides a point cloud compression and decompression method based on octree coding and voxel context, the method introduces local voxel context in a depth entropy model based on the advantages of the voxel and octree method so as to better compress octree structure data, and can be applied to static and dynamic point cloud compression.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a schematic diagram illustrating an implementation flow of a point cloud compression and decompression method based on octree coding and voxel context according to the present invention;

FIG. 2 is a schematic diagram illustrating an implementation principle of a point cloud compression and decompression method based on octree coding and voxel context;

FIG. 3 is a schematic representation of a binary voxel representation of a local region centered on a current node;

FIG. 4 is a schematic diagram illustrating an implementation principle of a distribution estimation process of node coding;

FIG. 5 is a schematic diagram showing a local voxel representation in a dynamic point cloud compression process;

fig. 6 is a schematic network structure diagram of a node coding prediction network and a coordinate optimization network.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1 and fig. 2, an embodiment of the present invention discloses a point cloud compression and decompression method based on octree coding and voxel context, including:

s1: and carrying out voxelization on the original point cloud data, establishing a corresponding octree structure, and representing child nodes of each non-leaf node in the octree structure by using node codes.

In this embodiment, the original point cloud is voxelized, and after coordinate quantization, the voxel containing a point is 1, and the voxel not containing a point is 0. And establishing corresponding octree structure, and using an 8-bit binary code s for the distribution of each non-leaf node of the octree_iAnd (4) showing.

S2: in the overall voxel representation of the depth corresponding to the current node, extracting a binary voxel representation of a local area taking the current node as the center, and generating a local voxel representation corresponding to each node.

This step essentially generates a local voxel representation V corresponding to each node_iThat is, in the global voxel representation of the depth corresponding to the current point, a binary voxel representation of a local region centered on the current node is extracted, and the local voxel representation is shown in fig. 3.

Referring to FIG. 3, (a) is a visual schematic diagram of sample point cloud, and (b) is an octree constructed based on the point cloud after voxelization (a)_iRepresents the midpoint r of (a)_iCorresponding octree nodes, the dark part of (c) representing n after voxelization_iIs located at the location of the voxel in the whole body, (d) is a hierarchical slice diagram of (c) where the darkest (i.e., black) colored voxel is n_iCorresponding voxels, other darker (i.e. grey) voxels corresponding to n_iPart ofContext.

S3: the local voxel representation is obtained corresponding to the context feature vector.

Referring to FIG. 4, for static point cloud compression, the invention directly uses a three-dimensional convolution network to extract local voxel representation V_iCorresponding context feature vector f_i。

Referring to fig. 5, for dynamic point cloud compression, it is assumed that the current node is located at the kth level of the octree corresponding to the t-th frame, and the corresponding local voxel representation is recorded as

Extraction using three-dimensional convolutional networks

And

four local voxel representations to obtain the context feature vector f_iNamely, the information of the previous and the next frames is used for assisting in judging the current node code s_iDistribution of (2). In dynamic point cloud compression, decompression is performed layer by layer, so that the four local voxel representations can be obtained when the current node is decompressed.

As shown in FIG. 5, the k layers of the t-1 frame, the t frame, and the t +1 frame, and the k +1 layer of the t-1 frame are in a decoded state, the k +1 layer of the t frame is in a decoding state, and the k +1 layer of the t +1 frame is in an undecoded state.

S4: connecting the context feature vector with a feature consisting of coordinate information of the node and depth information of the node in an octree structure, obtaining probability distribution corresponding to node coding through a pre-constructed prediction network, and carrying out lossless compression coding or decompression of coding on the node coding by using an arithmetic coding algorithm.

Will f is_iAnd the characteristic c consisting of the coordinate information of the node and the depth information of the node in the octree_iConnecting to obtain s through the prediction network_iProbability distribution of corresponding cases, based on which s is calculated by arithmetic coding algorithm_iPerforming lossless compression coding or encodingDecompression of line codes. The prediction network in this embodiment is obtained by pre-training the real point cloud data, and the network structure may be referred to as part (a) in fig. 6.

The arithmetic coding algorithm mentioned above is one of the main algorithms for image compression, and is a lossless data compression method, and is also an entropy coding method. The difference from other entropy coding methods is that other entropy coding methods generally divide an input message into symbols and then encode each symbol, and an arithmetic coding algorithm directly encodes the entire input message into a number, a decimal n satisfying (0.0 ≦ n < 1.0).

Preferably, the point cloud compression and decompression method based on octree coding and voxel context further includes:

s5: and predicting the offset of the actual point relative to the voxel center based on the local voxel representation corresponding to the leaf node, and generating a recovery point cloud with 3D coordinates.

A voxel representation is obtained by the decompressed octree code sequence. Each non-empty voxel in the voxel representation corresponds to a point in the restored point cloud. The embodiment is based on the local voxel representation V corresponding to the leaf nodes through a coordinate optimization network_iThe offset of the actual point to the voxel center is predicted, resulting in a restored point cloud with more accurate 3D coordinates. The network structure of the coordinate optimization network is shown in part (b) of fig. 6.

Referring to fig. 6, (a) the node coding prediction network first obtains context feature vectors from local voxel contexts through a 4-layer three-dimensional convolution network, then connects the context feature vectors with features formed by coordinate information of nodes and depth information of the nodes in an octree structure, and then obtains final predicted probability distribution through a network formed by 4 full-connected layers and one softmax layer.

(b) The medium coordinate optimization network firstly obtains context feature vectors from local voxel contexts through a 4-layer three-dimensional convolution network, then obtains the finally predicted coordinate offset through a network formed by 4 fully-connected layers, and in addition, the ReLU activation functions are used in both the two networks, and the positions are shown in FIG. 6 (b).

The point cloud data compression and decompression method disclosed by the embodiment utilizes the high-efficiency data organization capability of the octree structure-based method and the capability of effectively representing local information of the voxel-based method, and can be applied to static and dynamic point cloud geometric compression.

Specifically, an octree structure is first built based on the input original point cloud, with the eight-bit 01 symbol of each non-leaf node representing the occupancy states of its eight children. In the entropy coding stage, the present embodiment proposes a new entropy model based on deep learning to predict the distribution probability of these symbols for compression. In order to effectively generate the context information of the entropy model, the present embodiment utilizes a local binary voxel representation of each node, wherein the content of the local voxel representation is the distribution of adjacent nodes at the same depth in a local region centered on the currently processed node.

Furthermore, to reduce the information redundancy of dynamic point cloud compression, we also include local voxel representation from the same location of the previous frame point cloud to generate richer context information. In the reconstruction phase, the present embodiment further proposes a decoder-side local voxel representation-based coordinate optimization method to generate more accurate 3D coordinates for each leaf node in the static and dynamic point clouds.

In summary, the point cloud compression and decompression method based on octree coding and voxel context disclosed in the embodiments of the present invention structures sparse and disordered point clouds in a three-dimensional space by using an octree structure, compresses octree structure coding by combining spatial context information, and optimizes reconstructed point cloud coordinates by using spatial context. Meanwhile, the invention also expands the compression of dynamic point cloud data, and further reduces the code rate by utilizing the point cloud context information of the previous frame and the next frame.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A point cloud compression and decompression method based on octree coding and voxel context is characterized by comprising the following steps:

obtaining a context feature vector corresponding to the local voxel representation through a pre-constructed feature extraction network;

2. The method of claim 1, wherein the node coding is an eight-bit binary coding.

3. The method according to claim 1, wherein obtaining the context feature vector corresponding to the local voxel representation comprises:

Extraction using three-dimensional convolutional networks

And

and four local voxels represent, and context feature vectors are obtained.

4. The method for compressing and decompressing point cloud based on octree coding and voxel context according to any one of claims 1 to 3, further comprising: