CN110956194A

CN110956194A - Three-dimensional point cloud structuring method, classification method, equipment and device

Info

Publication number: CN110956194A
Application number: CN201910960562.3A
Authority: CN
Inventors: 梁国远; 陈帆; 周翊民; 何升展; 吴新宇; 冯伟; 武臻
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2019-10-10
Filing date: 2019-10-10
Publication date: 2020-04-03

Abstract

The application relates to the technical field of neural network processing, and particularly discloses a three-dimensional point cloud structuring method, a three-dimensional point cloud classifying method, three-dimensional point cloud structuring equipment and a three-dimensional point cloud structuring device, wherein the method comprises the following steps: carrying out feature extraction processing on the original three-dimensional point cloud through a local feature extraction network so as to obtain a local feature vector of a local area taking a plurality of points of the three-dimensional point cloud as a central point; carrying out deconvolution mapping processing on the local feature vector corresponding to each central point through a deconvolution neural network to obtain a plurality of local feature mapping graphs, wherein the local feature mapping graphs correspond to the local feature vectors one by one; and performing maximum pooling on the plurality of local feature maps through the first maximum pooling layer to obtain a fused image. By the aid of the method, the unstructured original three-dimensional point cloud can be converted into structured image data.

Description

Three-dimensional point cloud structuring method, classification method, equipment and device

Technical Field

The present application relates to the field of neural network processing technologies, and in particular, to a method, a device, and an apparatus for structuring a three-dimensional point cloud.

Background

The research of the deep learning model on the three-dimensional shape is started later relative to the field of two-dimensional images. The image is structured and can be represented as a matrix on a two-dimensional plane, but the three-dimensional point cloud and the grid are unstructured and cannot be directly input into a deep neural network.

Disclosure of Invention

The technical problem mainly solved by the application is to provide a three-dimensional point cloud structuring method, a three-dimensional point cloud classifying device and a three-dimensional point cloud structuring device, which can convert unstructured original three-dimensional point clouds into structured image data.

In one aspect, the present application provides a method for structuring a three-dimensional point cloud based on a deconvolution neural network, the method including: carrying out feature extraction processing on the original three-dimensional point cloud through a local feature extraction network so as to obtain a local feature vector of a local area taking a plurality of points of the three-dimensional point cloud as a central point; carrying out deconvolution mapping processing on the local feature vector corresponding to each central point through a deconvolution neural network to obtain a plurality of local feature mapping graphs, wherein the local feature mapping graphs correspond to the local feature vectors one by one; and performing maximum pooling on the plurality of local feature maps through the first maximum pooling layer to obtain a fused image.

In another aspect, the present application provides a method for classifying a three-dimensional point cloud based on a deconvolution neural network, the method including the method for structuring the three-dimensional point cloud based on the deconvolution neural network as described above; and classifying the fusion image through a classification network to realize the classification of the three-dimensional point cloud.

In yet another aspect, the present application provides an image classification device based on a deconvolution neural network, where the device includes a memory, a processor, and the processor is coupled to the memory; the processor is matched with the memory to realize the classification method of the three-dimensional point cloud based on the deconvolution neural network.

In still another aspect, the present application provides an apparatus having a storage function, wherein the apparatus stores program data, and the program data can realize the method for structuring a three-dimensional point cloud based on a deconvolution neural network when executed, or the program data can realize the method for classifying a three-dimensional point cloud based on a deconvolution neural network when executed.

The beneficial effect of this application is: different from the situation of the prior art, the method and the device perform feature extraction processing on the original three-dimensional point cloud through a local feature extraction network, and further obtain local feature vectors of local areas with a plurality of points of the three-dimensional point cloud as central points; carrying out deconvolution mapping processing on the local feature vector corresponding to each central point through a deconvolution neural network, wherein the deconvolution neural network can automatically learn the projection mapping relation from the points to the image and retain local feature information useful for three-dimensional point cloud classification so as to obtain a plurality of local feature mapping maps, and the local feature mapping maps correspond to the local feature vectors one to one; the method comprises the steps of performing maximum pooling on a plurality of local feature mapping images through a first maximum pooling layer to obtain a fusion image, performing feature extraction on three-dimensional point cloud by using a deconvolution-based neural network, converting unstructured original three-dimensional point cloud into structured image data, and obviously improving the robustness of the obtained features.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a method for structuring a three-dimensional point cloud based on a deconvolution neural network according to the present application;

FIG. 2 is a schematic flow chart of another embodiment of the structuring method of the three-dimensional point cloud based on the deconvolution neural network according to the present application;

FIG. 3 is a schematic flow chart of an embodiment of the classification method for three-dimensional point cloud based on deconvolution neural network according to the present application;

FIG. 4 is a schematic flow chart of another embodiment of the classification method of the three-dimensional point cloud based on the deconvolution neural network according to the present application;

FIG. 5 is a schematic structural diagram of an embodiment of an image classification device 50 based on a deconvolution neural network according to the present application;

FIG. 6 is a schematic structural diagram of an embodiment of an apparatus with a storage function according to the present application;

FIG. 7 is a schematic structural diagram of an embodiment of a method for structuring a three-dimensional point cloud based on a deconvolution neural network according to the present application;

FIG. 8 is a schematic diagram of a local feature extraction network according to the present application;

FIG. 9 is a schematic diagram of another structure of a local feature extraction network according to the present application;

FIG. 10 is a schematic structural diagram of another embodiment of the method for structuring a three-dimensional point cloud based on a deconvolution neural network according to the present application;

FIG. 11 is a schematic diagram of a deconvolution neural network of the present application;

FIG. 12 is a schematic diagram of a deconvolution neural network of the present application;

fig. 13 is an experimental result of the classification method of the three-dimensional point cloud based on the deconvolution neural network according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1 and 7, fig. 1 is a schematic flowchart of an embodiment of a method for structuring a three-dimensional point cloud based on a deconvolution neural network, and fig. 7 is a schematic structural diagram of an embodiment of the method for structuring a three-dimensional point cloud based on a deconvolution neural network. The application provides a three-dimensional point cloud structuring method based on a deconvolution neural network, which comprises the following steps:

s11: and performing feature extraction processing on the original three-dimensional point cloud through a local feature extraction network to further obtain a local feature vector of a local area taking a plurality of points of the three-dimensional point cloud as a central point.

Specifically, the original three-dimensional point cloud may be: the 3D scanner scans a target object to obtain a set of points, and the 3D scanner may be a handheld 3D scanner, a point cloud camera, or the like.

In this step, the local feature extraction network may adopt a farthest point sampling algorithm (FPS) to select multiple points in the three-dimensional point cloud as a central point, and take an adjacent area with a radius r on a space where the central point is located, where the points in the adjacent area are a local area set of the central point. And then, the local region set of the central point is encoded into a local feature vector through a micro-dot net layer.

S12: and performing deconvolution mapping processing on the local feature vector corresponding to each central point through a deconvolution neural network to obtain a plurality of local feature mapping maps, wherein the local feature mapping maps correspond to the local feature vectors one by one.

In particular, the deconvolution neural network may have a plurality of convolutional layers and a first maximum pooling layer connected to the plurality of convolutional layers. The deconvolution neural network can adopt a series of connected deconvolution layers, each deconvolution layer is learned on the basis of a local feature mapping graph learned by the last deconvolution layer, and the size of the local feature mapping graph is continuously increased, so that the size of the local feature mapping graph is reduced.

In the application, local feature vectors corresponding to each central point can be subjected to up-sampling learning and object edge optimization by using a local sensitive deconvolution neural network to obtain a local feature map corresponding to each local feature vector, and the local feature map can be at least one of an RGB score map or a depth score map.

S13: and performing maximum pooling on the plurality of local feature maps through the first maximum pooling layer to obtain a fused image.

Specifically, the first maximum value pooling layer may calculate a maximum value from feature values extracted from each neighborhood of the plurality of local feature maps, and integrate the maximum values to obtain a fused image. The maximum value pooling treatment can reserve the maximum value in the local feature mapping image, reduce the deviation of the estimated mean value caused by the parameter error of the deconvolution layer and reserve more feature information. Meanwhile, the first maximum pooling layer can reduce the scale of the input image, simplify the complexity of calculation and reduce the phenomenon of overfitting to a certain extent. In other embodiments, the maximum pooling layer may be replaced with an average pooling layer and a random pooling layer.

Different from the situation of the prior art, the method and the device perform feature extraction processing on the original three-dimensional point cloud through a local feature extraction network, and further obtain local feature vectors of local areas with a plurality of points of the three-dimensional point cloud as central points; carrying out deconvolution mapping processing on the local feature vector corresponding to each central point through a deconvolution neural network, wherein the deconvolution neural network can automatically learn the projection mapping relation from the points to the image and retain local feature information useful for three-dimensional point cloud classification so as to obtain a plurality of local feature mapping maps, and the local feature mapping maps correspond to the local feature vectors one to one; and performing maximum pooling on the plurality of local feature mapping images through the first maximum pooling layer to obtain a fused image, thereby realizing a scheme of converting the unstructured original three-dimensional point cloud into structured image data.

In one embodiment, referring to fig. 2 and fig. 8-10, fig. 2 is a schematic flow chart of another embodiment of the method for structuring a three-dimensional point cloud based on a deconvolution neural network according to the present application. Fig. 8 is a schematic structural diagram of a local feature extraction network of the present application, fig. 9 is a schematic structural diagram of another local feature extraction network of the present application, and fig. 10 is a schematic structural diagram of another embodiment of a method for structuring a three-dimensional point cloud based on a deconvolution neural network of the present application. The local feature extraction network comprises at least one set abstraction layer, and the set abstraction layer comprises three key layers: a sampling layer, a grouping layer, and a dot-net layer.

Specifically, the sampling layer is used for taking the original three-dimensional point cloud or a central point set output by a previous level set abstraction layer as an input point cloud, and sampling a first point cloud subset from the input point cloud, wherein the first point cloud subset defines a centroid of a local area. The grouping layer is used for forming a plurality of second point cloud subsets by searching adjacent region points which take the points in the first point cloud subset as central points from the input point cloud. The pointnet layer is used to encode the second point cloud subset into a local feature vector.

The collection abstraction layer may take an (N × (d + C)) matrix as the input point cloud, the (N × (d + C)) matrix being N points with d-DIM coordinates and C-DIM feature points. The collection abstraction layer outputs a (N '× (d + C')) matrix, which may be N 'points with d-DIM coordinates and new C' -DIM feature vectors.

Step S11 includes the following steps:

s21: the sampling layer takes the original three-dimensional point cloud or a central point set output by a previous level set abstraction layer as an input point cloud, and a first point cloud subset is sampled from the input point cloud by utilizing a farthest point sampling algorithm.

In particular, the sampling layer may be a set of central points { x ] output as an original three-dimensional point cloud or via a previous level set abstraction layer₁，x₂，...，x_nUsing the point cloud as an input point cloud, and selecting a first point cloud subset { x ] from the input point cloud by using a furthest point sampling algorithm_i1，x_i2，...，x_imIn which x_ijIs a subset of distance point clouds { x_i1，x_i2，...，x_ij-1The farthest point.

The above farthest point sampling algorithm is specifically: first, randomly selecting a central point set { x }₁，x₂，...，x_nOne point in the lattice, then selecting the point farthest away from the point as the starting point, and repeating the above process until the starting point is reachedUntil the required number is selected, finally sampling a first point cloud subset { x_i1，x_i2，...，x_im}. Compared with random sampling, the farthest point sampling algorithm can more completely sample all point clouds through the central point set.

S22: the grouping layer searches for adjacent area points taking the points in the first point cloud subset as central points from the input point cloud to form a plurality of second point cloud subsets.

In particular, the grouping layer is derived from the input point cloud { x₁，x₂，...，x_nFind with the first point cloud subset { x }_i1，x_i2，...，x_imThe points are K adjacent area points of a center point, wherein the size of the center point is N' × d. And the grouping layer outputs a plurality of groups of second point cloud subsets with the size of N' × K × (d + C), and each group of second point cloud subsets corresponds to a local area.

It should be noted that the K value of each set of second point cloud subsets may be different, and the mesh layer can convert a flexible number of neighboring region points into fixed-length local feature vectors.

S23: and respectively extracting the features of the second point cloud subset by the dot network layer to obtain a local feature vector corresponding to the central point.

Specifically, the point net layer is used as a basic building block for local model learning, and point-to-point relations in a local area can be captured through the coordinates of the central point and the local feature vector of the central point.

And inputting a plurality of groups of second point cloud subsets with the size of N ' × K × (d + C) in the dot net layer, taking each group of second point cloud subsets as a local area, respectively extracting the features of the second point cloud subsets, and outputting local feature vectors with the size of N ' × (d + C '). The method specifically comprises the following steps of converting point coordinates in a local area into local feature vectors relative to a central point: x is the number of_i ^(j)＝x_i ^(j)-x^^(j)Wherein i is 1, 2, …, K; j is 1, 2, …, d, where x is the coordinate of the center point.

In an embodiment, the local feature extraction network may include at least two collection abstraction layers arranged in cascade, where a first collection abstraction layer uses an original three-dimensional point cloud as an input point cloud, and a subsequent collection abstraction layer uses a collection of central points processed by a previous collection abstraction layer as an input point cloud.

Specifically, the local feature extraction network may include a first set abstraction layer and a second set abstraction layer arranged in cascade, where each set abstraction layer includes three key layers: a sampling layer, a grouping layer and a dot network layer, wherein the first collection abstraction layer takes the original three-dimensional point cloud as an input point cloud and executes the steps S21-S23 in the embodiment; the second collection abstraction layer uses the collection of the center points processed by the previous collection abstraction layer as the input point cloud, and repeatedly performs steps S21-S23 in the above embodiment.

In this embodiment, some more important points are selected as the central point of each local area through a farthest point sampling algorithm, then K adjacent area points are selected around the central points, and then the K adjacent area points are used as a local area, and a point mesh layer is used to perform feature extraction on the local area, so as to obtain a local feature vector corresponding to the central point. And each set abstraction layer sequentially performs sampling, grouping and feature extraction, repeats the process, and realizes hierarchical iterative extraction by using the set of the central points processed by the previous set abstraction layer as the input point cloud of the subsequent set abstraction layer to obtain the target number of local feature vectors.

Fig. 11 is a schematic diagram of a structure of the deconvolution neural network of the present application. In one embodiment, the deconvolution neural network includes at least one deconvolution layer.

Step S12 includes: the deconvolution layer takes an up-sampling image obtained by up-sampling the central point or a local feature mapping image output by the previous deconvolution layer as an input image, performs deconvolution mapping processing on the input image, and outputs a processed local feature mapping image.

Specifically, the center point is upsampled to obtain an upsampled image, and the upsampling is used for improving the image resolution. The local feature vector corresponding to each central point can be regarded as a feature map with the size of 1 × 1, a single up-sampling structural unit is constructed for this purpose, and the local feature vectors are up-sampled to obtain an up-sampled image with the size of 2 × 2. The process of upsampling is also similar to that of a convolution, except that the input features are interpolated to a larger feature map prior to convolution and then convolved.

The parameters of the deconvolution layer can be set to be twice of the upsampled image, and the learned convolution kernel in the deconvolution layer corresponds to the basic size of the local feature mapping map, so that the structuring of the three-dimensional point cloud based on the deconvolution neural network is realized. The deconvolution layer may use, as an input image, an upsampled image obtained by upsampling a central point or a local feature map output by a previous-stage deconvolution layer, wherein the deconvolution mapping process is an inverse operation of the convolution mapping process.

Fig. 12 is a schematic diagram of a structure of the deconvolution neural network of the present application. In one embodiment, the deconvolution neural network includes at least one depth residual layer.

Step S12 further includes the steps of: and performing depth residual optimization on the processed local feature mapping image through a depth residual layer.

Specifically, the depth Residual layer is a Residual Neural Network (ResNet). The depth of a deconvolution neural network is very important to its performance, so in an ideal case, the depth should be as deep as possible, as long as the network does not fit well. However, an optimization problem that can be encountered when actually training a neural network is that, as the depth of the neural network is continuously deepened, the gradient tends to disappear (i.e., gradient dispersion) the later, it is difficult to optimize the model, and the accuracy of the network is rather reduced. Alternatively, as the depth of the neural network is increased, a re-formation (Degradation) problem occurs, i.e., the accuracy rate increases first and then reaches saturation, and then the accuracy rate decreases as the depth is increased. Based on the above description, it can be known that when the number of layers of the neural network reaches a certain number, the performance of the network is saturated, and the performance of the network starts to degrade when the number of layers of the neural network increases, but the degradation is not caused by overfitting because the training precision and the testing precision are reduced, which indicates that the neural network is difficult to train when the network reaches a certain depth. And the depth residual layer is used for solving the problem of performance degradation of the network depth after the network depth is deepened.

According to the method, the depth residual error layer is adopted to carry out depth residual error optimization on the processed local feature mapping image, and the gradient dispersion problem caused by the fact that the number of the deconvolution neural network layers is too deep is solved due to the fact that a residual error network structure is introduced into the depth residual error layer, and the structuring accuracy of the three-dimensional point cloud is improved.

In one embodiment, the deconvolution neural network comprises deconvolution layers and depth residual layers which are alternately arranged in a cascade manner, wherein the first deconvolution layer takes an up-sampled image obtained by up-sampling a central point as an input image, and the subsequent deconvolution layers take a local feature mapping image subjected to depth residual optimization by the previous depth residual layer as an input image.

Specifically, the deconvolution neural network includes deconvolution layers and depth residual layers arranged in an alternating cascade. The plurality of deconvolution layers may include a first deconvolution layer, a second deconvolution layer, a third deconvolution layer, and a fourth deconvolution layer, and the plurality of depth residual layers may include a first depth residual layer, a second depth residual layer, a third depth residual layer, and a fourth depth residual layer.

Each local feature vector is up-sampled to obtain an up-sampled image, and the up-sampled images correspond to the local feature vectors one to one. The method includes inputting an upsampled image into a first deconvolution layer, performing deconvolution mapping processing on the upsampled image by the first deconvolution layer, and outputting a first local feature map, wherein the size of the first local feature map is twice that of the upsampled image. The first depth residual layer performs depth feature extraction on the first local feature map. And inputting the first local feature map after the depth feature extraction into a second deconvolution layer, performing deconvolution mapping processing on the first local feature map through the second deconvolution layer, and outputting a second local feature map, wherein the size of the second local feature map is twice that of the first local feature map. And the second depth residual layer is used for carrying out depth feature extraction on the second local feature map. And inputting the second local feature map after the depth feature extraction into a third deconvolution layer, performing deconvolution mapping processing on the second local feature map through the third deconvolution layer, and outputting a third local feature map, wherein the size of the third local feature map is twice that of the second local feature map. And the third depth residual layer is used for carrying out depth feature extraction on the second local feature map. And inputting the third local feature map after the depth feature extraction into a fourth deconvolution layer, performing deconvolution mapping processing on the third local feature map through the fourth deconvolution layer, and outputting a fourth local feature map, wherein the size of the fourth local feature map is twice that of the third local feature map. And the fourth depth residual layer performs depth feature extraction on the fourth local feature map, wherein the local feature mapping map is the fourth local feature map after the depth feature extraction.

The following is further described in detail in accordance with the above embodiments of the present application:

the original three-dimensional point cloud is subjected to feature extraction processing through a local feature extraction network, so that local feature vectors of a local area with a plurality of points of the three-dimensional point cloud as a central point are obtained, and the number of the local feature vectors can be 128. The local feature vector corresponding to each central point can be regarded as a feature map with the size of 1 × 1, for which, a single up-sampling structural unit is constructed, and the local feature vectors are up-sampled to obtain 128 up-sampled images with the size of 2 × 2. Inputting 128 up-sampled images of size 2 × 2 to a deconvolution neural network comprising four sets of deconvolution layers and four sets of depth residual layers in an alternating cascade arrangement:

1. outputting 128 first local feature maps with the size of 4 multiplied by 4 after passing through the first deconvolution layer, and performing depth feature extraction on the first local feature maps by using the first depth residual error layer;

2. outputting 128 second local feature maps with the size of 8 multiplied by 8 after passing through a second deconvolution layer, and performing depth feature extraction on the second local feature maps by using a second depth residual error layer;

3. outputting 128 third local feature maps with the size of 16 multiplied by 16 after passing through a third deconvolution layer, and performing depth feature extraction on the third local feature maps by using a third depth residual error layer;

4. and finally outputting 128 fourth local feature maps with the size of 32 multiplied by 32 after passing through a fourth deconvolution layer, and performing depth feature extraction on the fourth local feature maps by using a fourth depth residual error layer.

Referring to fig. 3, fig. 3 is a schematic flowchart of an embodiment of the classification method for three-dimensional point cloud based on deconvolution neural network according to the present application, and the method includes the following steps:

s31: the method for structuring the three-dimensional point cloud based on the deconvolution neural network is as described above.

Please specifically refer to the above embodiment of the method for structuring a three-dimensional point cloud based on a deconvolution neural network, which is not described herein again.

S32: and classifying the fusion image through a classification network so as to realize the classification of the three-dimensional point cloud.

Specifically, the fusion image obtained in step S13 is classified by the classification network,

the application provides a three-dimensional point cloud classification method based on a deconvolution neural network, which can overcome disorder, sparsity and limitation of original three-dimensional point clouds, realize effective classification of the three-dimensional point clouds based on the deconvolution neural network, and has high classification accuracy.

In an embodiment, referring to fig. 4, fig. 4 is a schematic flowchart of another embodiment of the method for classifying a three-dimensional point cloud based on a deconvolution neural network according to the present application. The classification network comprises at least one group of convolution layers arranged in a cascade mode, a batch standard layer, an activation function layer, a second maximum value pooling layer and a full contact layer connected with the second maximum value pooling layer of the last group.

Specifically, the classification network may include a first convolution layer, a second convolution layer, a third convolution layer, and a fourth convolution layer, which are arranged in a cascade manner, wherein a batch standard layer, an activation function layer, and a second maximum value pooling layer are sequentially connected to each convolution layer.

Step S32 includes the following steps:

s41: and the convolution layer performs two-dimensional convolution operation on the fusion image or the fusion mapping image output by the second maximum pooling layer of the previous group as an input image so as to extract the characteristic data.

Wherein the convolutional layer is used for executing two-dimensional convolution operation, and the two-dimensional convolution operation comprises: and carrying out counterpoint multiplication operation on the convolution kernel matrix data and the sub-matrix data of the convolution kernel matrix data at the current position to obtain a plurality of elements, and carrying out accumulation summation operation on the plurality of elements to obtain a convolution result of the current position. That is to say, in the embodiment of the present application, the submatrix operation unit performs convolution operation by using a bit-by-bit multiplication and summation method.

S42: and the batch standard layer is used for carrying out standardization processing on the characteristic data.

And the batch standard layer is used for carrying out batch standardized processing on the image data of the features after the two-dimensional convolution operation and accelerating network training.

S43: and the activation function layer performs linear activation on the normalized feature data, wherein a parameter correction linear unit ReLu is selected as an activation function.

The activation function layer is used for activating the characteristics after batch standardization, purposefully expressing useful picture characteristic information, and can select a nonlinear ReLu function as the activation function to train data after the convolutional layer. By means of a given ReLu filter, useful information larger than a certain threshold is activated, and useful information smaller than the threshold is suppressed, and the activation formula of the ReLu filter can be activec max (0, Converc). The Active is the corresponding coordinate characteristic data after activation, the coordinate data matrix after final activation is marked as Active, max (0, convert) is the activation function, namely the value in the matrix is filtered by taking the threshold value as 0, and the maximum value of the current value and the threshold value is taken, so that the characteristics of stimulation and inhibition of the human body mechanism signals are better met. And performing pooling dimension reduction operation on the activated Active data matrix, so that the feature calculation efficiency is improved, and the maximum pooling operation is adopted in the pooling dimension reduction operation.

S44: the second max pooling max-pooling processes the activation data to form a fused map image.

The second maximum pooling may compute a maximum for feature values extracted from the activation data and integrate the maximum to obtain a fused mapping image. The maximum value pooling process can reserve the maximum value in the activation data, reduce the deviation of the estimated mean value caused by parameter errors of the convolutional layer and reserve more characteristic information. Meanwhile, the second maximum pooling can reduce the scale of the fusion mapping image, simplify the complexity of calculation and reduce the phenomenon of overfitting to a certain extent. In other embodiments, the maximum pooling layer may be replaced with an average pooling layer and a random pooling layer.

S45: and the full connection layer classifies the fusion mapping images output by the last group of second maximum pooling layers and outputs a classification result.

The fusion mapping image output by the last group of second maximum pooling layers can be classified by using the full-link layer, so that a classification prediction result is obtained.

Performing maximum pooling processing on the activation data based on a second maximum pooling function to obtain a fusion mapping image after dimension reduction; and inputting the fused mapping image subjected to the dimension reduction into a full Connected layers (FC) for classification processing, and generating a classification prediction result.

Based on the above embodiment, the method for classifying the three-dimensional point cloud based on the deconvolution neural network can obtain better classification performance. Therefore, the classification performance of the classification method of the three-dimensional point cloud is tested. In the test experiment, four slices with different local feature vectors are respectively selected: 128. 256, 512, 1024, the experimental results are shown in fig. 13.

As can be seen from fig. 13, in the method for classifying a three-dimensional point cloud based on a deconvolution neural network, the number of local feature vectors in the original three-dimensional point cloud is reduced to 128, and the 128 local feature vectors still can retain the main structure of the original three-dimensional point cloud, so that the three-dimensional point cloud can be accurately classified. As can be seen from fig. 11, when the number of local feature vectors in the original three-dimensional point cloud is 128, 256, 512, 1024, the standard dataset of the original three-dimensional point cloud classification includes 12311 three-dimensional point cloud objects from 40 classes, and the 12311 three-dimensional point cloud object is divided into two parts: 9843 training samples and 2468 testing samples, the classification accuracy of the classification method of the three-dimensional point cloud based on the deconvolution neural network is as follows:

when the number of the local feature vectors is 128, the classification accuracy is 87.6%;

when the number of the local feature vectors is 256, the classification accuracy is 88.2%;

when the number of the local feature vectors is 512, the classification accuracy is 88.4%;

when the number of the local feature vectors is 1024, the classification accuracy is 88.7%;

when the number of the local feature vectors is 2048, the classification accuracy is 89.9%.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an embodiment of the image classification device based on the deconvolution neural network according to the present application, where the image classification device 50 includes a memory 51 and a processor 52, and the processor 52 is coupled to the memory 51.

The processor 52 is operative to implement, in cooperation with the memory 51, an image classification method based on a deconvolution neural network as previously described.

Wherein, the processor 52 is used for realizing the classification method of the three-dimensional point cloud based on the deconvolution neural network in cooperation with the memory 51 during working; the processor 52 is configured to perform a classification process on the fused image through a classification network to realize the classification of the three-dimensional point cloud.

The classification network comprises at least one group of cascade-arranged convolution layers, a batch standard layer, an activation function layer, a second maximum value pooling layer and a full contact layer connected with the second maximum value pooling layer of the last group.

The processor 52 is configured to perform a two-dimensional convolution operation on the fused image or the fused mapped image output via the second maximum pooling layer of the previous group as an input image through the convolution layer to extract feature data.

The processor 52 is configured to normalize the feature data via the batch normalization layer.

The processor 52 is configured to perform linear activation on the normalized feature data through an activation function layer, wherein the parameter modification linear unit ReLu is selected as an activation function.

Processor 52 is operative to maximum pooling the activation data via a second maximum pooling to form a fused map image.

The processor 52 is configured to perform classification processing on the fused mapping image output by the last group of the second maximum pooling layer through the full-connected layer, and output a classification result.

The application provides an image classification device 50 based on a deconvolution neural network, which can overcome the disorder, sparsity and limitation of original three-dimensional point clouds, realize effective classification of the three-dimensional point clouds based on the deconvolution neural network, and has higher classification accuracy.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an embodiment of the apparatus with storage function 60 of the present application, in which the apparatus 60 stores program data 61, and when the program data 61 is executed, the method for structuring a three-dimensional point cloud based on a deconvolution neural network as in the above-mentioned embodiment can be implemented, or when the program data 61 is executed, the method for classifying a three-dimensional point cloud based on a deconvolution neural network as in the above-mentioned embodiment can be implemented.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a" does not exclude the presence of other similar elements in a process, method, article, or apparatus that comprises the element.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program data can be stored in a computer readable storage medium, and the program data executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: ROM, RAM, magnetic or optical disks, and the like.

The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A method for structuring a three-dimensional point cloud based on a deconvolution neural network, the method comprising:

carrying out feature extraction processing on the original three-dimensional point cloud through a local feature extraction network so as to obtain a local feature vector of a local area taking a plurality of points of the three-dimensional point cloud as a central point;

performing deconvolution mapping processing on the local feature vector corresponding to each central point through the deconvolution neural network to obtain a plurality of local feature mapping maps, wherein the local feature mapping maps correspond to the local feature vectors one by one;

and performing maximum pooling on the local feature maps through a first maximum pooling layer to obtain a fused image.

2. The method of claim 1, wherein the local feature extraction network comprises at least one aggregate abstraction layer, wherein the aggregate abstraction layer comprises a sampling layer, a grouping layer, and a pointnet layer;

the step of carrying out feature extraction processing on the original three-dimensional point cloud through the local feature extraction network comprises the following steps:

the sampling layer takes the original three-dimensional point cloud or a central point set output by the set abstraction layer at the previous stage as an input point cloud, and a first point cloud subset is sampled from the input point cloud by utilizing a farthest point sampling algorithm;

the grouping layer searches for adjacent area points taking the points in the first point cloud subset as the central point from the input point cloud to form a plurality of second point cloud subsets;

and the dot net layer respectively extracts the features of the second point cloud subsets to obtain local feature vectors corresponding to the central points.

3. The method of claim 2, wherein the local feature extraction network comprises at least two collection abstraction layers arranged in cascade, wherein a first collection abstraction layer uses the original three-dimensional point cloud as the input point cloud, and a subsequent collection abstraction layer uses the set of the central points processed by the previous collection abstraction layer as the input point cloud.

4. The method of claim 1, wherein the deconvolution neural network comprises at least one deconvolution layer;

the step of performing deconvolution mapping processing on the local feature vector corresponding to each central point through the deconvolution neural network includes:

and the deconvolution layer takes an up-sampling image obtained by up-sampling the central point or a local feature mapping image output by the previous deconvolution layer as an input image, performs deconvolution mapping processing on the input image, and outputs the processed local feature mapping image.

5. The method of claim 4, wherein the deconvolution neural network includes at least one depth residual layer;

the step of performing deconvolution mapping processing on the local feature vector corresponding to each central point through the deconvolution neural network further includes:

and performing depth residual optimization on the processed local feature mapping image through the depth residual layer.

6. The method of claim 5, wherein the deconvolution neural network comprises the deconvolution layers and depth residual layers arranged in an alternating cascade, wherein a first one of the deconvolution layers uses an upsampled image obtained by upsampling the central point as the input image, and a subsequent one of the deconvolution layers uses the local feature map subjected to depth residual optimization by the previous depth residual layer as the input image.

7. A method for classifying a three-dimensional point cloud based on a deconvolution neural network, the method comprising the method for structuring a three-dimensional point cloud based on a deconvolution neural network according to any one of claims 1 to 6; and

and classifying the fused image through a classification network so as to realize the classification of the three-dimensional point cloud.

8. The method of classifying a three-dimensional point cloud based on a deconvolution neural network of claim 7, wherein the classification network comprises at least one set of a convolutional layer, a batch standard layer, an activation function layer and a second maximum pooling layer arranged in cascade, and a full contact layer connected to the second maximum pooling layer of the last set;

the step of classifying the fused image through a classification network comprises:

the convolution layer carries out two-dimensional convolution operation on the fused image or the fused mapping image output by the second maximum pooling layer of the previous group as an input image so as to extract characteristic data;

the batch standard layer is used for carrying out standardization processing on the characteristic data;

the activation function layer is used for carrying out linear activation on the normalized feature data, wherein a parameter correction linear unit ReLu is selected as an activation function;

the second max pooling maximum pooling processing the activation data to form the fused map image;

and the full connection layer classifies the fusion mapping image output by the second maximum pooling layer of the last group and outputs a classification result.

9. An image classification device based on a deconvolution neural network, the device comprising a memory, a processor, the processor coupled to the memory;

the processor, in cooperation with the memory, is operative to implement the method for classifying a three-dimensional point cloud based on a deconvolution neural network of any one of claims 7-8.

10. An apparatus having a storage function, characterized in that the apparatus stores program data which, when executed, is capable of implementing the method of structuring a three-dimensional point cloud based on a deconvolution neural network according to any one of claims 1 to 6, or which, when executed, is capable of implementing the method of classifying a three-dimensional point cloud based on a deconvolution neural network according to any one of claims 7 to 8.