CN115471576A

CN115471576A - Point cloud lossless compression method and device based on deep learning

Info

Publication number: CN115471576A
Application number: CN202211080672.9A
Authority: CN
Inventors: 王岩; 金熠琦
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-09-05
Filing date: 2022-09-05
Publication date: 2022-12-13

Abstract

The invention provides a point cloud lossless compression method and device based on deep learning, which comprises the following steps: representing a target point cloud by using an octree; traversing the octree with breadth first to obtain a node sequence corresponding to each layer of the octree; separating the node sequence corresponding to each layer of the octree by using a context window to obtain a small segment sequence corresponding to each layer of the octree; and performing parallel lossless compression on the small segment sequences corresponding to each layer of the octree according to the sequence of the octree layers from top to bottom to obtain a compression result of the target point cloud. The invention greatly improves the point cloud compression speed by carrying out layered parallel compression on the small segment sequence; meanwhile, for any small segment sequence, the small segment sequence is subjected to block coding by using a self-attention neural network model, so that more node context information can be utilized for coding, and the compression rate of coding is further improved.

Description

Point cloud lossless compression method and device based on deep learning

Technical Field

The invention relates to the technical field of point cloud compression, in particular to a point cloud lossless compression method and device based on deep learning.

Background

With the rapid development of 3D scanning technology, people can now effectively capture complex geometric information through fine-grained point clouds. The point cloud compression is crucial to the storage and transmission of point cloud data, so the development of point cloud compression technology is receiving wide attention.

At present, common point cloud compression methods include OctAttentention and VoxelContext-Net; wherein OctAttention is a self-attribute based compression method that converts point clouds into octrees; the octree is pulled into a node sequence according to a breadth-first traversal mode, and node information comprises the number of layers of nodes in the octree, the positions of the nodes in a father node and the OctValue of the nodes; then based on the context, learning the OctValue probability distribution of each node of the octree by using a neural network containing self-attention; and finally, entropy coding is carried out on the OctValue probability distribution of each node of the octree to obtain a point cloud compression file. Because this way only uses the node information of the node before this node as the context when encoding each node, the distribution of the context in the space is not uniform, which affects the prediction accuracy of the OctValue probability distribution and further affects the encoding effect. VoxelContext-Net differs from octantentation in that when each node is encoded, the prediction of the octalue probability distribution is performed using a convolutional neural network with the node information of a node close to the node in space as a context. In the method, the OctValue of the nodes on the same layer is less used when each node is coded, and the convolutional neural network has limited operation capability and cannot cover a large space, so that the used context information is very little, the prediction accuracy of the probability distribution of the OctValue is poor, and the coding effect is poor.

Disclosure of Invention

The invention provides a point cloud lossless compression method and device based on deep learning, which are used for solving the defect of poor point cloud compression effect in the prior art, and more node context information is applied to the encoding process by dividing small-segment sequences and performing block encoding on each small-segment sequence by using a self-attention neural network model, so that the encoding compression rate is improved.

In a first aspect, the invention provides a point cloud lossless compression method based on deep learning, which comprises the following steps:

representing a target point cloud by using an octree;

performing breadth-first traversal on the octree to obtain a node sequence corresponding to each layer of the octree;

separating the node sequence corresponding to each layer of the octree by using a context window to obtain a small segment sequence corresponding to each layer of the octree;

performing parallel lossless compression on the small segment sequences corresponding to each layer of the octree according to the sequence of the octree layers from top to bottom to obtain a compression result of the target point cloud;

the lossless compression of the small segment sequence is realized by grouping and coding the small segment sequence by using a self-attention neural network model.

According to the point cloud lossless compression method based on deep learning provided by the invention, the node sequence corresponding to each layer of the octree is separated by using a context window to obtain a small segment sequence corresponding to each layer of the octree, and the method comprises the following steps:

and dividing the node sequence corresponding to each layer of the octree by using the length of the context window as the length of the small segment sequence to obtain the small segment sequence corresponding to each layer of the octree.

According to the point cloud lossless compression method based on deep learning provided by the invention, lossless compression is carried out on the small segment sequence, and the method comprises the following steps:

grouping nodes in the small segment sequence;

determining the OctValue probability distribution of each node in the small segment sequence by using a self-attention neural network model, the first mask tensor, the second mask tensor and the node information of each node and the ancestor nodes thereof in the small segment sequence;

performing node block coding on the small segment sequence according to the increasing sequence of the group number based on the OctValue probability distribution of each node in the small segment sequence to obtain a coding result of the small segment sequence;

the context nodes comprise the layer number of the octree where the nodes are located, the positions of the nodes in a father node and the OctValue of the nodes; the OctValue of the node is an eight-bit binary number representing the existence condition of eight subspace points of the node;

the first mask tensor is to mask the OctValue of all nodes in the sequence of fragments; the second mask tensor is intended to mask the OctValue of the node with the group number larger than the group number of the corresponding node in the small segment sequence in the learning process of the OctValue probability distribution of each node in the small segment sequence.

According to the point cloud lossless compression method based on deep learning provided by the invention, the grouping of the nodes in the small segment sequence comprises the following steps:

the node serial numbers between the nodes in the same group are separated by LM;

wherein M is the number of packet groups, and L is an integer greater than or equal to 1.

According to the point cloud lossless compression method based on deep learning provided by the invention, the determining the OctValue probability distribution of each node in the small segment sequence by using the self-attention neural network model, the first mask tensor, the second mask tensor and the context node of each node and the ancestor nodes thereof in the small segment sequence comprises the following steps:

representing context nodes of each node and ancestor nodes thereof in the small segment sequence in a three-dimensional tensor form;

processing the three-dimensional tensor by using the self-attention neural network model based on the first mask tensor and the second mask tensor to obtain the OctValue probability distribution of each node in the small segment sequence.

According to the point cloud lossless compression method based on deep learning, the size of the three-dimensional tensor is (N, K, 3), and the neural network module comprises an Embedding layer, a first reply layer, a first transform network structure, a second reply layer, a first Linear layer and a SoftMax layer;

processing the three-dimensional tensor by using the self-attention neural network model based on the first mask tensor and the second mask tensor to obtain an OctValue probability distribution of each node in the small segment sequence, including:

converting the three-dimensional tensor size to (N, K, S) in the Embedding layer;

shaping a three-dimensional tensor of (N, K, S) size into a two-dimensional tensor of (N, KS) in the first Reshape layer;

in the first transform network structure, performing node OctValue probability distribution learning on the two-dimensional tensor by using the first mask tensor to obtain a first learning tensor with the size of (N, KS);

in the second transform network structure, performing node OctValue probability distribution learning on the two-dimensional tensor by using the second mask tensor to obtain a second learning tensor with the size of (N, KS);

shaping a sum of the first learning tensor and the second learning tensor into a two-dimensional tensor of (N, KS) in the second Reshape layer;

regularizing the two-dimensional tensor output by the second Reshape layer by using the first Linear layer to obtain a two-dimensional tensor with the size of (N, 255);

normalizing the two-dimensional tensor with the size of (N, 255) in a SoftMax layer to obtain the OctValue probability distribution of each node in the small segment sequence from the normalization processing result; wherein N is the number of nodes contained in the small segment sequence, K is the number of ancestor nodes traced upwards, and S is any integer greater than 3.

According to the point cloud lossless compression method based on deep learning, the first transform network structure and the second transform network structure are consistent and are formed by serially stacking R encoders;

the encoder comprises a first sub-layer connection structure and a second sub-layer connection structure connected with the first sub-layer connection structure;

the first sublayer connection structure comprises a multi-head self-attention sublayer, a normalization layer and a residual connection;

the second sub-layer connection structure comprises a second Linear layer, a normalization layer and a residual connection;

in the first transform network structure, performing node OctValue probability distribution learning on the two-dimensional tensor by using the first mask tensor to obtain a first learning tensor with a size of (N, KS), including:

taking the two-dimensional tensor as an input of a multi-head self-attention sublayer of a first encoder in the first transform network structure, taking the first mask tensor as an input mask tensor of the multi-head self-attention sublayer of each encoder in the first transform network structure, and learning the two-dimensional tensor by using the first transform network structure to perform node OctValue probability distribution to obtain a first learning tensor with the size of (N, KS);

in the second transform network structure, performing node OctValue probability distribution learning on the two-dimensional tensor by using the second mask tensor to obtain a second learning tensor with size (N, KS), including:

and taking the two-dimensional tensor as the input of a multi-head self-attention sublayer of a first encoder in the second transform network structure, taking the second mask tensor as the input mask tensor of the multi-head self-attention sublayer of each encoder in the second transform network structure, and learning the two-dimensional tensor by using the second transform network structure to perform node OctValue probability distribution to obtain a second learning tensor with the size of (N, KS).

According to the point cloud lossless compression method based on deep learning, provided by the invention, the self-attention neural network model is constructed on the basis of a pre-training neural network;

the pre-training neural network is trained by using a small segment sequence sample of octree nodes with nodes OctValue randomly set to zero.

In a second aspect, the present invention provides a point cloud lossless compression apparatus based on deep learning, including:

the octree representation module is used for representing the target point cloud by utilizing the octree;

the breadth-first traversal module is used for carrying out breadth-first traversal on the octree to obtain a node sequence corresponding to each layer of the octree;

a separation module, configured to separate, by using a context window, a node sequence corresponding to each layer of the octree to obtain a small segment sequence corresponding to each layer of the octree;

the compression module is used for performing parallel lossless compression on the small segment sequences corresponding to each layer of the octree according to the sequence of the octree layers from top to bottom so as to obtain a compression result of the target point cloud;

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the deep learning-based point cloud lossless compression method according to the first aspect when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method for lossless compression of point clouds based on deep learning according to the first aspect.

The invention provides a point cloud lossless compression method and device based on deep learning, which improve the technology of realizing point cloud compression by using octree, and specifically comprise the following steps: converting the target point cloud into an octree; paving each layer of nodes of the octree into a node sequence according to the breadth priority order; then, segmenting a node sequence of each layer of node of the octree after being flattened by using a context window to obtain a small segment sequence corresponding to each layer of the octree; and performing parallel lossless compression on the small segment sequences corresponding to each layer of the octree according to the sequence of the octree levels from top to bottom to obtain a compression result of the target point cloud. The invention greatly improves the point cloud compression speed by carrying out layered parallel compression on the small segment sequence; meanwhile, for any small segment of sequence, the self-attention neural network model is utilized to perform block coding on the small segment of sequence, so that more node context information can be utilized for coding, and the compression rate of coding is further improved.

Drawings

In order to more clearly illustrate the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic flow chart of a point cloud lossless compression method based on deep learning provided by the present invention;

FIG. 2 is a schematic structural diagram of a transformer network structure provided in the present invention;

FIG. 3 is a schematic structural diagram of a point cloud lossless compression apparatus based on deep learning according to the present invention;

fig. 4 is a schematic structural diagram of an electronic device implementing a point cloud lossless compression method based on deep learning according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

Abbreviations specific to the field of expertise are explained as follows:

voxel volume: a data structure for representing a three-dimensional object is represented using fixed-size cubes as minimum units.

Octree: a tree data structure for describing a three-dimensional space. Each node of the octree represents a volume element of a cube, each node has eight children nodes, corresponding to eight equal subspaces of the cube, and the volume elements represented by the eight children nodes are added together to be equal to the volume of a parent node.

OctValue: eight-bit binary numbers formed by eight child nodes of the octree node.

Compression ratio (bpp): after point cloud compression, averaging the bit number occupied by each point.

Context Window.

bptt: a small sequence divided by a context window.

The point cloud lossless compression method and device based on deep learning of the present invention are described below with reference to fig. 1 to 4.

In a first aspect, the present invention provides a point cloud lossless compression method based on deep learning, as shown in fig. 1, the method includes:

s11, representing target point cloud by using an octree;

representing the target point cloud using an octree may reduce spatial redundancy.

In particular, an octree is a tree data structure that describes a 3D space, with eight children nodes per internal node. Each node of the octree can represent a space, and the corresponding eight sub-nodes subdivide the space into eight trigrams, so that the point cloud information of the space is abstractly described by using an 8-bit binary number consisting of 8 sub-cube point existence cases (existence is marked as 1, and nonexistence is marked as 0).

S12, performing breadth-first traversal on the octree to obtain a node sequence corresponding to each layer of the octree;

breadth-first traversal and hierarchy traversal are performed, each layer is sequentially visited from top to bottom, nodes are visited from left to right (or from right to left) in each layer, and the next layer is entered after the nodes are visited, until no nodes can be visited.

S13, separating the node sequence corresponding to each layer of the octree by using a context window to obtain a small segment sequence corresponding to each layer of the octree;

it will be appreciated that by separating the node sequences representing nodes at any level of the octree, a plurality of small segment sequences can be obtained accordingly.

S14, performing parallel lossless compression on small segment sequences corresponding to each layer of the octree according to the sequence of the octree layers from top to bottom to obtain a compression result of the target point cloud;

in other words, the sequence of the segment belonging to one octree level is encoded at the same time, and the sequence of the segment corresponding to the octree level closer to the root node is encoded preferentially than the sequence of the segment corresponding to the octree level farther from the root node. And when each group of nodes is coded, an entropy coding mode is adopted.

For example, the octree has seven layers, i.e., a first layer, a second layer, \ 8230, and a seventh layer from top to bottom. And compressing the small-segment sequences separated from each layer of node sequences in parallel, compressing the small-segment sequences separated from the first layer of node sequences before the small-segment sequences separated from the second layer of node sequences, compressing the small-segment sequences separated from the second layer of node sequences before the small-segment sequences separated from the third layer of node sequences, and so on to finally obtain the compression result of the target point cloud.

In order to accelerate the decoding speed, the invention provides a coding method according to an octree level, and provides a small segment sequence of the same layer to be coded in parallel, and the information of all octree nodes of the upper K layers can be used when coding each layer.

Prior studies have shown that the information of ancestor and sibling nodes is equally important for the encoding of the current node.

Existing compression techniques either use insufficient context information. Or intolerable decoding complexity is brought (for example, the octantention method can only decode one node at a time, so that a self-attention neural network needs to be called once to decode one node, which causes very large time overhead and extremely slow decoding time), so the invention provides a block coding strategy based on a self-attention neural network model, develops a sufficient and effective context for each node, and decodes a group of nodes at a time during decoding, reduces the decoding complexity and improves the decoding rate.

The point cloud lossless compression method based on deep learning improves the technology of realizing point cloud compression by using octree, and specifically comprises the following steps: converting the target point cloud into an octree; paving each layer of nodes of the octree into a node sequence according to the breadth priority order; then, segmenting a node sequence of each layer of node of the octree after being flattened by using a context window to obtain a small segment sequence corresponding to each layer of the octree; and performing parallel lossless compression on the small segment sequences corresponding to each layer of the octree according to the sequence of the octree levels from top to bottom to obtain a compression result of the target point cloud. The invention greatly improves the point cloud compression speed by carrying out layered parallel compression on the small segment sequence; meanwhile, for any small segment of sequence, the self-attention neural network model is utilized to perform block coding on the small segment of sequence, so that more node context information can be utilized for coding, and the compression rate of coding is further improved.

On the basis of the foregoing embodiment, as an optional embodiment, the separating, by using a context window, the node sequence corresponding to each layer of the octree to obtain a small segment sequence corresponding to each layer of the octree includes:

and dividing the node sequence corresponding to each layer of the octree by using the length of the following window as the length of the small segment sequence to obtain the small segment sequence corresponding to each layer of the octree.

For the node sequence representing each layer of nodes, firstly supplementing zero nodes at the tail of the node sequence so as to enable the length of the node sequence to be integral multiple of the length of the context window;

and then dividing the node sequences from left to right by taking the length of the context window as the length of the small segment sequences to obtain a plurality of small segment sequences.

For example, the node sequence is { x } ₁ 、x ₂ …x ₃₆ The context window can cover 8 nodes, separable into { x } ₁ 、x ₂ …x ₈ }、{x ₈ 、x ₉ …x ₁₆ }、{x ₁₇ 、x ₁₈ …x ₂₄ }、{x ₂₅ 、x ₂₆ …x ₃₂ }、{x ₃₃ 、x ₃₄ 、x ₃₅ 、x ₃₆ 0,0} five subsequences.

This operation ensures that the nodes in each context window constitute a small sequence, which is convenient for processing in the self-attention neural network model.

On the basis of the above embodiment, as an alternative embodiment, the performing lossless compression on the small segment sequence includes:

grouping nodes in the small segment sequence;

that is, the group number corresponding to each node in the small segment sequence is preferably uniformly distributed in the grouping, so as to ensure that the context information is uniformly distributed in the space.

Determining the OctValue probability distribution of each node in the small segment sequence by using a self-attention neural network model, the first mask tensor, the second mask tensor and the node information of each node and the ancestor nodes thereof in the small segment sequence; the node information comprises the number of layers of the octree where the node is located, the position of the node in a father node and the OctValue of the node; the OctValue of the node is an eight-bit binary number representing the existence condition of eight subspace points of the node; the first mask tensor is to mask the OctValue of all nodes in the small sequence; the second mask tensor is intended to mask the OctValue of the node with the group number larger than the group number of the corresponding node in the small segment sequence in the learning process of the OctValue probability distribution of each node in the small segment sequence.

The self-attention neural network model comprises two transform structures, when a first mask tensor is introduced into the first transform structure to enable the first transform structure to learn the OctValue probability distribution of each node in the small segment sequence, the adopted context information comprises node information of ancestor nodes traced back by each node in the small segment sequence, the number of layers of an octree where each node in the small segment sequence is located and the position of each node in the small segment sequence in a father node, and the OctValue of any node in the small segment sequence is not adopted;

when a second transform structure introduces a second mask tensor to enable the second transform structure to learn the probability distribution of each node ctvalue in the small segment sequence, the context information adopted includes node information of ancestor nodes traced back by each node in the small segment sequence, the number of layers of an octree in which each node in the small segment sequence is located, the position of each node in the small segment sequence in a father node, and the ctvalue of a node of a coded group in the small segment sequence (for a pth group, the coded group is from the p-1 st group to the 1 st group), and the ctvalue of a node of an uncoded group in the small segment sequence is not adopted (for the pth group, the uncoded group is from the p +1 st group to the M th group).

Because the packet decoding has a precedence order, the added mask tensor needs to ensure that the OctValue of the post-decoding node is not used during the encoding, and the setting mode accords with the decoding rule. And after the sum of the input of the first Transformer structure and the input of the second Transformer structure is obtained through a linear layer, the OctValue probability distribution can be learned.

Assuming the packet group is 4, the second mask tensor can be as shown in table 1.

TABLE 1

	1	2	3	4	1	2	3	4
									1	0			0
2	0	0			0	0
									3	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0
									1	0			0
2	0	0			0	0
									3	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0

Where 0 represents that the ctvalue information is available, i.e., the subsequent group can use the ctvalue information of the previous group, but the previous group cannot use the ctvalue information of the subsequent group.

Correspondingly, 0 corresponds to black and the rest corresponds to white in the graphical representation.

The invention designs two transform channels and adopts different mask operations to encode octree, thereby realizing effective decoding and reducing the decoding complexity.

And performing node block coding on the small segment sequence according to the increasing sequence of the group number based on the OctValue probability distribution of each node in the small segment sequence to obtain a coding result of the small segment sequence.

In practice, because the OctValue of each node in the short segment sequence is known, the coding of each packet node is allowed to be performed simultaneously.

Decoding is the inverse process of encoding, and since decoding of a next group of nodes requires using the ctvalue of a previous group of nodes, and the ctvalue of the previous group of nodes can only be obtained after decoding of the previous group of nodes, decoding must be performed strictly according to the increasing sequence of group numbers, where the ctvalue of the first group of nodes is initialized to 0.

The invention utilizes the self-attention neural network model, can develop sufficient and effective context information for each node, so as to improve the prediction effect of the OctValue and further improve the compression bpp.

On the basis of the foregoing embodiment, as an optional embodiment, the grouping nodes in the small segment sequence includes:

wherein M is the number of the grouping groups selected according to requirements, and L is an integer which is more than or equal to 1.

For example: setting M to 4, the existing fragment sequence is { y } ₁ 、y ₂ …y ₁₆ H, after grouping { y } ₁ 、y ₅ 、y ₉ 、y ₁₃ Belongs to the first group, { y ₂ 、y ₆ 、y ₁₀ 、y ₁₄ Belong to a second group, { y ₃ 、y ₇ 、y ₁₁ 、y ₁₅ Belongs to a third group, { y ₄ 、y ₈ 、y ₁₂ 、y ₁₆ The fourth group is a group of four,

the invention adopts a uniform grouping mode, so that when each node in a small segment sequence is coded, the nodes which can use the OctValue information in the brother nodes are uniformly distributed, and part of the nodes are positioned in front of the node and part of the nodes are positioned behind the node, thereby being beneficial to improving the context information of the node and improving the compression ratio.

Based on the foregoing embodiment, as an alternative embodiment, the determining the OctValue probability distribution of each node in the small segment sequence by using the self-attention neural network model, the first mask tensor, the second mask tensor, and the node information of each node in the small segment sequence and its ancestor nodes includes:

representing the node information of each node and ancestor nodes thereof in the small segment sequence in a three-dimensional tensor form;

namely, the size of the three-dimensional tensor can be (N, K, 3), N is the number of nodes contained in the small segment sequence, K is the number of ancestor nodes traced upwards, and 3 refers to 3 types of node information.

And processing the three-dimensional tensor by using the self-attention neural network model based on the first mask tensor and the first mask tensor to obtain the OctValue probability distribution of each node in the small segment sequence.

The context information of the octree nodes is extracted by adopting self-attention technology, so that more context information can be utilized during coding of the octree nodes, and the compression result is improved.

On the basis of the foregoing embodiment, as an optional embodiment, the three-dimensional tensor size is (N, K, 3), and the neural network module includes an Embedding layer, a first reply layer, a first transform network structure, a second reply layer, a first Linear layer, and a SoftMax layer;

It should be noted that, between the first Reshape layer and the transform network structure, a position Encoding layer may be further included for labeling the positions of the sequence nodes. Between the first Linear layer and the SoftMax layer, a Truncate layer is further included for cutting off a zero node part in the two-dimensional tensor with the size of (N, 255) to obtain the size of (N) ₀ 255) a two-dimensional tensor; so that the pair size in the SoftMax layer is (N) ₀ 255) to obtain N ₀ OctValue probability distribution of individual nodes, N ₀ The number of non-zero nodes in the small segment sequence. The first Reshape layer may select a multi-layer perceptron (MPL).

The self-attention neural network model is similar to a Transformer encoder structure, the mask tensor mask covers information which cannot be used in the decoding process according to the grouping decoding sequence, context information and decoding complexity are considered, and the compression speed and the compression rate are improved.

On the basis of the foregoing embodiments, as an alternative embodiment, the first transform network structure and the second transform network structure are consistent, and fig. 2 illustrates a schematic structural diagram of the second transform network structure, which is shown in fig. 2 and includes R encoders stacked together;

the encoder comprises a first sub-layer connection structure and a second sub-layer connection structure taking the output of the first sub-layer connection structure as the input;

the first sub-layer connection structure comprises a multi-head self-attention sub-layer, a normalization layer and a residual connection;

Specifically, the second Linear layer optional multilayer perceptron (MLP), the first transform network structure and the second transform network structure are completely the same except that the input mask tensors are different, and the following describes the implementation process of the second transform network structure:

because the encoders are stacked in series, the input of the first encoder in the second transform network structure is the two-dimensional tensor, the output of the last encoder in the second transform network structure is the output of the second transform network structure, and the a-th encoder output is the a + 1-th encoder input.

The first encoder of the second transformer network structure processes the two-dimensional tensor as follows:

in the first sub-layer connection structure, self-attention learning is carried out on the two-dimensional tensor by utilizing a multi-head self-attention sub-layer, and a weighted two-dimensional tensor is obtained;

weighting the result of the normalized two-dimensional tensor, adding the result of the normalized two-dimensional tensor to the two-dimensional tensor through residual connection, and inputting the result into a second sub-layer connecting structure;

in the second sub-layer connection structure, the result processed by the second Linear layer and the normalization layer is input, and is added with the input through residual connection, and the addition result is the output of the first encoder of the second transform network structure.

Other encoders are similar and will not be described in detail.

Wherein, utilize bull self-attention sublayer to carry out the self-attention study to the two-dimensional tensor, obtain the weighted two-dimensional tensor, include:

splitting the two-dimensional tensor into K tensors, each tensor is

Inputting K two-dimensional tensors into a fourth Linear layer, a fifth Linear layer and a sixth Linear layer as a whole; the fourth, fifth and sixth Linear layers can all adopt multilayer perceptrons (MLPs), and the MLPs can enable feature vectors to be fully crossed among different dimensions, so that a network can capture more nonlinear features and combined features;

the output matrixes of the K heads of the fourth Linear layer and the output matrixes of the K heads of the fifth Linear layer are transposed into the first MatMui layer to carry out matrix multiplication, and the sizes of the K heads which correspond to each other are obtained

A matrix of (a);

the size of the second mask matrix with size (N, N) corresponding to K heads in the Scale layer is

The matrix of (2) is multiplied by elements, after the result passes through a SoftMax layer, the matrix is multiplied by the output matrix of a sixth Linear layer in a second MatMui layer, and the weighted two-dimensional tensor can be obtained after splicing.

Here, the multi-head attention mechanism is helpful for the network to capture rich characteristic information, and residual connection can solve the problem of small gradient and the problem of weakening of the weight matrix.

The method adopts a multi-head attention mechanism, deeply learns the characteristics of the two-dimensional tensor, performs weight distribution based on the mask tensor, and improves the learning accuracy of the OctValue probability distribution.

On the basis of the above embodiment, as an alternative embodiment, the self-attention neural network model is constructed on the basis of a pre-trained neural network;

the pre-training neural network is trained by using octree node small segment sequence samples with nodes OctValue randomly set to zero.

According to the method, a pretrainin strategy is adopted to receive BERT inspiration, the OctValue of an input sample is set to zero with a 50% probability, a pre-trained neural network is trained by taking the original OctValue as a target to obtain a network initial weight, and then formal training is carried out on the basis, so that the effect of a pre-trained self-attention neural network model is obviously better than that of a non-pre-trained self-attention neural network model.

The pre-training operation of the invention improves the learning effect and generalization ability of the self-attention neural network model.

In conclusion, the hierarchical grouped point cloud coding and decoding strategy introduced by the scheme can effectively improve the compression rate of deep learning lossless point cloud compression, and compared with the OctAttention of SOTA, the compression rate is improved on both Lidar and Object data sets. More importantly, due to the introduction of a layering and grouping method, the point cloud compression speed is shortened by 97% compared with OctAttention, which is equivalent to 1/30 of the original point cloud compression speed, and the practical application of a neural network in point cloud compression can be promoted by the improvement.

In addition, on the basis of the scheme, a larger context window, adjusting layer parameters, channel numbers, attention modules and the like are adopted, and better effects can be achieved. Meanwhile, in practical application, the parallel realization of the operation of the neural network can be considered, and the compression/decompression speed can be greatly improved.

Therefore, the complete point cloud compression also includes a decoding part, and the decoding part is the inverse process of the encoding part, so the decoding flow is as follows:

performing parallel lossless decompression on the small segment sequences corresponding to each layer of the octree according to the sequence of the octree levels from top to bottom to obtain a decompressed octree;

converting into a point cloud structure by decompressing the octree;

the lossless decompression of the small-segment sequence is realized by grouping and decoding the small-segment sequence by using a self-attention neural network model.

Specifically, the lossless decompression is performed on the small segment of sequence, which includes:

initializing an OctValue value to 0;

decoding a first group of nodes of the small segment sequence by using a self-attention neural network model to obtain OctValue of the first group of nodes of the small segment sequence;

decoding a second group of nodes of the small-segment sequence by using a self-attention neural network model based on the OctValue of the first group of nodes of the small-segment sequence to obtain the OctValue of the second group of nodes of the small-segment sequence;

decoding a third group of nodes of the small-segment sequence by using a self-attention neural network model based on the OctValue of the second group of nodes of the small-segment sequence to obtain the OctValue of the third group of nodes of the small-segment sequence;

and by analogy, a small-segment sequence decompression result is obtained.

The point cloud lossless compression device based on deep learning provided by the invention is described below, and the point cloud lossless compression device based on deep learning described below and the point cloud lossless compression method based on deep learning described above can be referred to correspondingly. Fig. 3 illustrates a schematic structural diagram of a point cloud lossless compression apparatus based on deep learning, as shown in fig. 3, including:

an octree representation module 21, configured to represent the target point cloud by using octrees;

a breadth-first traversal module 22, configured to perform breadth-first traversal on the octree, so as to obtain a node sequence corresponding to each layer of the octree;

a separation module 23, configured to separate, by using a context window, a node sequence corresponding to each layer of the octree to obtain a small segment sequence corresponding to each layer of the octree;

the compression module 24 is configured to perform parallel lossless compression on the small segment sequences corresponding to each layer of the octree according to a sequence of octree levels from top to bottom to obtain a compression result of the target point cloud;

On the basis of the above embodiment, as an optional embodiment, the separation module is specifically configured to:

On the basis of the foregoing embodiment, as an optional embodiment, the compression module specifically includes:

a grouping unit, configured to group nodes in the small segment sequence;

a determining unit, configured to determine, by using a self-attention neural network model, a first mask tensor, a second mask tensor, and node information of each node and an ancestor node thereof in the small segment sequence, an OctValue probability distribution of each node in the small segment sequence;

the block coding unit is used for carrying out node block coding on the small segment sequence according to the increasing sequence of the group number based on the OctValue probability distribution of each node in the small segment sequence to obtain the coding result of the small segment sequence;

the node information comprises the number of layers of an octree where the node is located, the position of the node in a father node and the OctValue of the node; the OctValue of the node is an eight-bit binary number representing the existence condition of eight subspace points of the node;

On the basis of the foregoing embodiment, as an optional embodiment, the grouping unit is specifically configured to:

On the basis of the above embodiment, as an alternative embodiment, the determining unit includes

The three-dimensional tensor representation submodule is used for representing the node information of each node and an ancestor node thereof in the small segment sequence in a three-dimensional tensor form;

and the processing submodule is used for processing the three-dimensional tensor by using the self-attention neural network model based on the first mask tensor and the second mask tensor to obtain the OctValue probability distribution of each node in the small segment sequence.

the processing submodule specifically includes:

a tensor conversion subunit, configured to convert the three-dimensional tensor size into (N, K, S) in the Embedding layer;

a first shaping subunit configured to shape a three-dimensional tensor of size (N, K, S) into a two-dimensional tensor of size (N, KS) in the first Reshape layer;

a first learning subunit, configured to perform node OctValue probability distribution learning on the two-dimensional tensor by using the first mask tensor in the first transform network structure to obtain a first learning tensor with a size of (N, KS);

a second learning subunit, configured to perform node OctValue probability distribution learning on the two-dimensional tensor by using the second mask tensor in the second transform network structure to obtain a second learning tensor with a size of (N, KS);

a second shaping subunit configured to shape, in the second Reshape layer, a sum result of the first learning tensor and the second learning tensor into a two-dimensional tensor of (N, KS);

the regularization subunit is configured to regularize, by using the first Linear layer, the two-dimensional tensor output by the second reply layer to obtain a two-dimensional tensor with a size of (N, 255);

a normalization processing subunit, configured to perform normalization processing on the two-dimensional tensor with size (N, 255) in the SoftMax layer, so as to obtain, from a result of the normalization processing, an OctValue probability distribution of each node in the small segment sequence; wherein N is the number of nodes contained in the small segment sequence, K is the number of ancestor nodes traced upwards, and S is any integer greater than 3.

On the basis of the above embodiment, as an alternative embodiment, the first transform network structure and the second transform network structure are identical and are stacked in series by R encoders;

the first learning subunit is specifically configured to:

taking the two-dimensional tensor as the input of a multi-head self-attention sublayer of a first encoder in the first transform network structure, taking the first mask tensor as the input mask tensor of the multi-head self-attention sublayer of each encoder in the first transform network structure, and utilizing the first transform network structure to learn the two-dimensional tensor to perform node OctValue probability distribution to obtain a first learning tensor with the size of (N, KS);

the second learning subunit is specifically configured to:

taking the two-dimensional tensor as an input of a multi-head self-attention sublayer of a first encoder in the second transform network structure, taking the second mask tensor as an input mask tensor of the multi-head self-attention sublayer of each encoder in the second transform network structure, and learning the two-dimensional tensor by using the second transform network structure to perform node OctValue probability distribution to obtain a second learning tensor with the size of (N, KS);

Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor) 410, a communication Interface (Communications Interface) 420, a memory (memory) 430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are in communication with each other via the communication bus 440. The processor 410 may invoke logic instructions in the memory 430 to perform a deep learning based point cloud lossless compression method, the method comprising: representing a target point cloud by using an octree; performing breadth-first traversal on the octree to obtain a node sequence corresponding to each layer of the octree; separating the node sequence corresponding to each layer of the octree by using a context window to obtain a small segment sequence corresponding to each layer of the octree; performing parallel lossless compression on the small segment sequences corresponding to each layer of the octree according to the sequence of the octree layers from top to bottom to obtain a compression result of the target point cloud; the lossless compression of the small segment sequence is realized by grouping and coding the small segment sequence by using a self-attention neural network model.

In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing a method for lossless compression of point clouds based on deep learning, the method comprising: representing a target point cloud by using an octree; traversing the octree with breadth first to obtain a node sequence corresponding to each layer of the octree; separating the node sequence corresponding to each layer of the octree by using a context window to obtain a small segment sequence corresponding to each layer of the octree; performing parallel lossless compression on the small segment sequences corresponding to each layer of the octree according to the sequence of the octree levels from top to bottom to obtain a compression result of the target point cloud; the lossless compression of the small segment sequence is realized by grouping and coding the small segment sequence by using a self-attention neural network model. In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform a method for lossless compression of point clouds based on deep learning, the method comprising: representing a target point cloud by using an octree; performing breadth-first traversal on the octree to obtain a node sequence corresponding to each layer of the octree; separating the node sequence corresponding to each layer of the octree by using a context window to obtain a small segment sequence corresponding to each layer of the octree; performing parallel lossless compression on the small segment sequences corresponding to each layer of the octree according to the sequence of the octree levels from top to bottom to obtain a compression result of the target point cloud; the lossless compression of the small segment sequence is realized by grouping and coding the small segment sequence by using a self-attention neural network model.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A point cloud lossless compression method based on deep learning is characterized by comprising the following steps:

representing a target point cloud by using an octree;

2. The point cloud lossless compression method based on deep learning of claim 1, wherein the separating the node sequence corresponding to each layer of the octree by using a context window to obtain the small segment sequence corresponding to each layer of the octree comprises:

3. The point cloud lossless compression method based on deep learning of claim 1, wherein the lossless compression of the small segment sequence includes:

grouping nodes in the small segment sequence;

based on the OctValue probability distribution of each node in the small segment sequence, carrying out node block coding on the small segment sequence according to the group number increasing sequence to obtain a coding result of the small segment sequence;

the first mask tensor is to mask the OctValue of all nodes in the small sequence; the second mask tensor is intended to mask the OctValue of the node with the group number larger than the group number of the corresponding node in the small segment sequence in the learning process of the OctValue probability distribution of each node in the small segment sequence.

4. The deep learning based point cloud lossless compression method according to claim 3, wherein the grouping nodes in the small segment sequence includes:

5. The point cloud lossless compression method based on deep learning of any one of claims 3 to 4, wherein the determining the OctValue probability distribution of each node in the small segment sequence by using the self-attention neural network model, the first mask tensor, the second mask tensor, and the node information of each node and its ancestor nodes in the small segment sequence includes:

expressing the node information of each node and the ancestor nodes thereof in the small segment sequence in a three-dimensional tensor form;

6. The point cloud lossless compression method based on deep learning of claim 5, wherein the three-dimensional tensor size is (N, K, 3), the neural network module includes an Embedding layer, a first Reshape layer, a first transform network structure, a second Reshape layer, a first Linear layer and a SoftMax layer;

converting the three-dimensional tensor size into (N, K, S) in the Embedding layer;

shaping the sum of the first and second learning tensors into a two-dimensional tensor of (N, KS) in the second Reshape layer;

normalizing the two-dimensional tensor with the size of (N, 255) in a SoftMax layer to obtain the OctValue probability distribution of each node in the small segment sequence from the normalization processing result;

wherein N is the number of nodes contained in the small segment sequence, K is the number of ancestor nodes traced upwards, and S is any integer greater than 3.

7. The point cloud lossless compression method based on deep learning of claim 6, wherein the first transform network structure and the second transform network structure are consistent and are stacked by R encoders in series;

8. The point cloud lossless compression method based on deep learning of claim 5, wherein the self-attention neural network model is constructed on the basis of a pre-trained neural network;

9. The utility model provides a point cloud lossless compression device based on deep learning which characterized in that includes:

10. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the deep learning based point cloud lossless compression method according to any one of claims 1 to 8.

11. A non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor implements the method for lossless compression of point clouds based on deep learning according to any one of claims 1 to 8.