CN110956194A - Three-dimensional point cloud structuring method, classification method, equipment and device - Google Patents

Three-dimensional point cloud structuring method, classification method, equipment and device Download PDF

Info

Publication number
CN110956194A
CN110956194A CN201910960562.3A CN201910960562A CN110956194A CN 110956194 A CN110956194 A CN 110956194A CN 201910960562 A CN201910960562 A CN 201910960562A CN 110956194 A CN110956194 A CN 110956194A
Authority
CN
China
Prior art keywords
layer
point cloud
local feature
deconvolution
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910960562.3A
Other languages
Chinese (zh)
Inventor
梁国远
陈帆
周翊民
何升展
吴新宇
冯伟
武臻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201910960562.3A priority Critical patent/CN110956194A/en
Publication of CN110956194A publication Critical patent/CN110956194A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to the technical field of neural network processing, and particularly discloses a three-dimensional point cloud structuring method, a three-dimensional point cloud classifying method, three-dimensional point cloud structuring equipment and a three-dimensional point cloud structuring device, wherein the method comprises the following steps: carrying out feature extraction processing on the original three-dimensional point cloud through a local feature extraction network so as to obtain a local feature vector of a local area taking a plurality of points of the three-dimensional point cloud as a central point; carrying out deconvolution mapping processing on the local feature vector corresponding to each central point through a deconvolution neural network to obtain a plurality of local feature mapping graphs, wherein the local feature mapping graphs correspond to the local feature vectors one by one; and performing maximum pooling on the plurality of local feature maps through the first maximum pooling layer to obtain a fused image. By the aid of the method, the unstructured original three-dimensional point cloud can be converted into structured image data.

Description

Three-dimensional point cloud structuring method, classification method, equipment and device
Technical Field
The present application relates to the field of neural network processing technologies, and in particular, to a method, a device, and an apparatus for structuring a three-dimensional point cloud.
Background
The research of the deep learning model on the three-dimensional shape is started later relative to the field of two-dimensional images. The image is structured and can be represented as a matrix on a two-dimensional plane, but the three-dimensional point cloud and the grid are unstructured and cannot be directly input into a deep neural network.
Disclosure of Invention
The technical problem mainly solved by the application is to provide a three-dimensional point cloud structuring method, a three-dimensional point cloud classifying device and a three-dimensional point cloud structuring device, which can convert unstructured original three-dimensional point clouds into structured image data.
In one aspect, the present application provides a method for structuring a three-dimensional point cloud based on a deconvolution neural network, the method including: carrying out feature extraction processing on the original three-dimensional point cloud through a local feature extraction network so as to obtain a local feature vector of a local area taking a plurality of points of the three-dimensional point cloud as a central point; carrying out deconvolution mapping processing on the local feature vector corresponding to each central point through a deconvolution neural network to obtain a plurality of local feature mapping graphs, wherein the local feature mapping graphs correspond to the local feature vectors one by one; and performing maximum pooling on the plurality of local feature maps through the first maximum pooling layer to obtain a fused image.
In another aspect, the present application provides a method for classifying a three-dimensional point cloud based on a deconvolution neural network, the method including the method for structuring the three-dimensional point cloud based on the deconvolution neural network as described above; and classifying the fusion image through a classification network to realize the classification of the three-dimensional point cloud.
In yet another aspect, the present application provides an image classification device based on a deconvolution neural network, where the device includes a memory, a processor, and the processor is coupled to the memory; the processor is matched with the memory to realize the classification method of the three-dimensional point cloud based on the deconvolution neural network.
In still another aspect, the present application provides an apparatus having a storage function, wherein the apparatus stores program data, and the program data can realize the method for structuring a three-dimensional point cloud based on a deconvolution neural network when executed, or the program data can realize the method for classifying a three-dimensional point cloud based on a deconvolution neural network when executed.
The beneficial effect of this application is: different from the situation of the prior art, the method and the device perform feature extraction processing on the original three-dimensional point cloud through a local feature extraction network, and further obtain local feature vectors of local areas with a plurality of points of the three-dimensional point cloud as central points; carrying out deconvolution mapping processing on the local feature vector corresponding to each central point through a deconvolution neural network, wherein the deconvolution neural network can automatically learn the projection mapping relation from the points to the image and retain local feature information useful for three-dimensional point cloud classification so as to obtain a plurality of local feature mapping maps, and the local feature mapping maps correspond to the local feature vectors one to one; the method comprises the steps of performing maximum pooling on a plurality of local feature mapping images through a first maximum pooling layer to obtain a fusion image, performing feature extraction on three-dimensional point cloud by using a deconvolution-based neural network, converting unstructured original three-dimensional point cloud into structured image data, and obviously improving the robustness of the obtained features.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:
FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a method for structuring a three-dimensional point cloud based on a deconvolution neural network according to the present application;
FIG. 2 is a schematic flow chart of another embodiment of the structuring method of the three-dimensional point cloud based on the deconvolution neural network according to the present application;
FIG. 3 is a schematic flow chart of an embodiment of the classification method for three-dimensional point cloud based on deconvolution neural network according to the present application;
FIG. 4 is a schematic flow chart of another embodiment of the classification method of the three-dimensional point cloud based on the deconvolution neural network according to the present application;
FIG. 5 is a schematic structural diagram of an embodiment of an image classification device 50 based on a deconvolution neural network according to the present application;
FIG. 6 is a schematic structural diagram of an embodiment of an apparatus with a storage function according to the present application;
FIG. 7 is a schematic structural diagram of an embodiment of a method for structuring a three-dimensional point cloud based on a deconvolution neural network according to the present application;
FIG. 8 is a schematic diagram of a local feature extraction network according to the present application;
FIG. 9 is a schematic diagram of another structure of a local feature extraction network according to the present application;
FIG. 10 is a schematic structural diagram of another embodiment of the method for structuring a three-dimensional point cloud based on a deconvolution neural network according to the present application;
FIG. 11 is a schematic diagram of a deconvolution neural network of the present application;
FIG. 12 is a schematic diagram of a deconvolution neural network of the present application;
fig. 13 is an experimental result of the classification method of the three-dimensional point cloud based on the deconvolution neural network according to the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1 and 7, fig. 1 is a schematic flowchart of an embodiment of a method for structuring a three-dimensional point cloud based on a deconvolution neural network, and fig. 7 is a schematic structural diagram of an embodiment of the method for structuring a three-dimensional point cloud based on a deconvolution neural network. The application provides a three-dimensional point cloud structuring method based on a deconvolution neural network, which comprises the following steps:
s11: and performing feature extraction processing on the original three-dimensional point cloud through a local feature extraction network to further obtain a local feature vector of a local area taking a plurality of points of the three-dimensional point cloud as a central point.
Specifically, the original three-dimensional point cloud may be: the 3D scanner scans a target object to obtain a set of points, and the 3D scanner may be a handheld 3D scanner, a point cloud camera, or the like.
In this step, the local feature extraction network may adopt a farthest point sampling algorithm (FPS) to select multiple points in the three-dimensional point cloud as a central point, and take an adjacent area with a radius r on a space where the central point is located, where the points in the adjacent area are a local area set of the central point. And then, the local region set of the central point is encoded into a local feature vector through a micro-dot net layer.
S12: and performing deconvolution mapping processing on the local feature vector corresponding to each central point through a deconvolution neural network to obtain a plurality of local feature mapping maps, wherein the local feature mapping maps correspond to the local feature vectors one by one.
In particular, the deconvolution neural network may have a plurality of convolutional layers and a first maximum pooling layer connected to the plurality of convolutional layers. The deconvolution neural network can adopt a series of connected deconvolution layers, each deconvolution layer is learned on the basis of a local feature mapping graph learned by the last deconvolution layer, and the size of the local feature mapping graph is continuously increased, so that the size of the local feature mapping graph is reduced.
In the application, local feature vectors corresponding to each central point can be subjected to up-sampling learning and object edge optimization by using a local sensitive deconvolution neural network to obtain a local feature map corresponding to each local feature vector, and the local feature map can be at least one of an RGB score map or a depth score map.
S13: and performing maximum pooling on the plurality of local feature maps through the first maximum pooling layer to obtain a fused image.
Specifically, the first maximum value pooling layer may calculate a maximum value from feature values extracted from each neighborhood of the plurality of local feature maps, and integrate the maximum values to obtain a fused image. The maximum value pooling treatment can reserve the maximum value in the local feature mapping image, reduce the deviation of the estimated mean value caused by the parameter error of the deconvolution layer and reserve more feature information. Meanwhile, the first maximum pooling layer can reduce the scale of the input image, simplify the complexity of calculation and reduce the phenomenon of overfitting to a certain extent. In other embodiments, the maximum pooling layer may be replaced with an average pooling layer and a random pooling layer.
Different from the situation of the prior art, the method and the device perform feature extraction processing on the original three-dimensional point cloud through a local feature extraction network, and further obtain local feature vectors of local areas with a plurality of points of the three-dimensional point cloud as central points; carrying out deconvolution mapping processing on the local feature vector corresponding to each central point through a deconvolution neural network, wherein the deconvolution neural network can automatically learn the projection mapping relation from the points to the image and retain local feature information useful for three-dimensional point cloud classification so as to obtain a plurality of local feature mapping maps, and the local feature mapping maps correspond to the local feature vectors one to one; and performing maximum pooling on the plurality of local feature mapping images through the first maximum pooling layer to obtain a fused image, thereby realizing a scheme of converting the unstructured original three-dimensional point cloud into structured image data.
In one embodiment, referring to fig. 2 and fig. 8-10, fig. 2 is a schematic flow chart of another embodiment of the method for structuring a three-dimensional point cloud based on a deconvolution neural network according to the present application. Fig. 8 is a schematic structural diagram of a local feature extraction network of the present application, fig. 9 is a schematic structural diagram of another local feature extraction network of the present application, and fig. 10 is a schematic structural diagram of another embodiment of a method for structuring a three-dimensional point cloud based on a deconvolution neural network of the present application. The local feature extraction network comprises at least one set abstraction layer, and the set abstraction layer comprises three key layers: a sampling layer, a grouping layer, and a dot-net layer.
Specifically, the sampling layer is used for taking the original three-dimensional point cloud or a central point set output by a previous level set abstraction layer as an input point cloud, and sampling a first point cloud subset from the input point cloud, wherein the first point cloud subset defines a centroid of a local area. The grouping layer is used for forming a plurality of second point cloud subsets by searching adjacent region points which take the points in the first point cloud subset as central points from the input point cloud. The pointnet layer is used to encode the second point cloud subset into a local feature vector.
The collection abstraction layer may take an (N × (d + C)) matrix as the input point cloud, the (N × (d + C)) matrix being N points with d-DIM coordinates and C-DIM feature points. The collection abstraction layer outputs a (N '× (d + C')) matrix, which may be N 'points with d-DIM coordinates and new C' -DIM feature vectors.
Step S11 includes the following steps:
s21: the sampling layer takes the original three-dimensional point cloud or a central point set output by a previous level set abstraction layer as an input point cloud, and a first point cloud subset is sampled from the input point cloud by utilizing a farthest point sampling algorithm.
In particular, the sampling layer may be a set of central points { x ] output as an original three-dimensional point cloud or via a previous level set abstraction layer1,x2,...,xnUsing the point cloud as an input point cloud, and selecting a first point cloud subset { x ] from the input point cloud by using a furthest point sampling algorithmi1,xi2,...,ximIn which xijIs a subset of distance point clouds { xi1,xi2,...,xij-1The farthest point.
The above farthest point sampling algorithm is specifically: first, randomly selecting a central point set { x }1,x2,...,xnOne point in the lattice, then selecting the point farthest away from the point as the starting point, and repeating the above process until the starting point is reachedUntil the required number is selected, finally sampling a first point cloud subset { xi1,xi2,...,xim}. Compared with random sampling, the farthest point sampling algorithm can more completely sample all point clouds through the central point set.
S22: the grouping layer searches for adjacent area points taking the points in the first point cloud subset as central points from the input point cloud to form a plurality of second point cloud subsets.
In particular, the grouping layer is derived from the input point cloud { x1,x2,...,xnFind with the first point cloud subset { x }i1,xi2,...,ximThe points are K adjacent area points of a center point, wherein the size of the center point is N' × d. And the grouping layer outputs a plurality of groups of second point cloud subsets with the size of N' × K × (d + C), and each group of second point cloud subsets corresponds to a local area.
It should be noted that the K value of each set of second point cloud subsets may be different, and the mesh layer can convert a flexible number of neighboring region points into fixed-length local feature vectors.
S23: and respectively extracting the features of the second point cloud subset by the dot network layer to obtain a local feature vector corresponding to the central point.
Specifically, the point net layer is used as a basic building block for local model learning, and point-to-point relations in a local area can be captured through the coordinates of the central point and the local feature vector of the central point.
And inputting a plurality of groups of second point cloud subsets with the size of N ' × K × (d + C) in the dot net layer, taking each group of second point cloud subsets as a local area, respectively extracting the features of the second point cloud subsets, and outputting local feature vectors with the size of N ' × (d + C '). The method specifically comprises the following steps of converting point coordinates in a local area into local feature vectors relative to a central point: x is the number ofi (j)=xi (j)-x^(j)Wherein i is 1, 2, …, K; j is 1, 2, …, d, where x is the coordinate of the center point.
In an embodiment, the local feature extraction network may include at least two collection abstraction layers arranged in cascade, where a first collection abstraction layer uses an original three-dimensional point cloud as an input point cloud, and a subsequent collection abstraction layer uses a collection of central points processed by a previous collection abstraction layer as an input point cloud.
Specifically, the local feature extraction network may include a first set abstraction layer and a second set abstraction layer arranged in cascade, where each set abstraction layer includes three key layers: a sampling layer, a grouping layer and a dot network layer, wherein the first collection abstraction layer takes the original three-dimensional point cloud as an input point cloud and executes the steps S21-S23 in the embodiment; the second collection abstraction layer uses the collection of the center points processed by the previous collection abstraction layer as the input point cloud, and repeatedly performs steps S21-S23 in the above embodiment.
In this embodiment, some more important points are selected as the central point of each local area through a farthest point sampling algorithm, then K adjacent area points are selected around the central points, and then the K adjacent area points are used as a local area, and a point mesh layer is used to perform feature extraction on the local area, so as to obtain a local feature vector corresponding to the central point. And each set abstraction layer sequentially performs sampling, grouping and feature extraction, repeats the process, and realizes hierarchical iterative extraction by using the set of the central points processed by the previous set abstraction layer as the input point cloud of the subsequent set abstraction layer to obtain the target number of local feature vectors.
Fig. 11 is a schematic diagram of a structure of the deconvolution neural network of the present application. In one embodiment, the deconvolution neural network includes at least one deconvolution layer.
Step S12 includes: the deconvolution layer takes an up-sampling image obtained by up-sampling the central point or a local feature mapping image output by the previous deconvolution layer as an input image, performs deconvolution mapping processing on the input image, and outputs a processed local feature mapping image.
Specifically, the center point is upsampled to obtain an upsampled image, and the upsampling is used for improving the image resolution. The local feature vector corresponding to each central point can be regarded as a feature map with the size of 1 × 1, a single up-sampling structural unit is constructed for this purpose, and the local feature vectors are up-sampled to obtain an up-sampled image with the size of 2 × 2. The process of upsampling is also similar to that of a convolution, except that the input features are interpolated to a larger feature map prior to convolution and then convolved.
The parameters of the deconvolution layer can be set to be twice of the upsampled image, and the learned convolution kernel in the deconvolution layer corresponds to the basic size of the local feature mapping map, so that the structuring of the three-dimensional point cloud based on the deconvolution neural network is realized. The deconvolution layer may use, as an input image, an upsampled image obtained by upsampling a central point or a local feature map output by a previous-stage deconvolution layer, wherein the deconvolution mapping process is an inverse operation of the convolution mapping process.
Fig. 12 is a schematic diagram of a structure of the deconvolution neural network of the present application. In one embodiment, the deconvolution neural network includes at least one depth residual layer.
Step S12 further includes the steps of: and performing depth residual optimization on the processed local feature mapping image through a depth residual layer.
Specifically, the depth Residual layer is a Residual Neural Network (ResNet). The depth of a deconvolution neural network is very important to its performance, so in an ideal case, the depth should be as deep as possible, as long as the network does not fit well. However, an optimization problem that can be encountered when actually training a neural network is that, as the depth of the neural network is continuously deepened, the gradient tends to disappear (i.e., gradient dispersion) the later, it is difficult to optimize the model, and the accuracy of the network is rather reduced. Alternatively, as the depth of the neural network is increased, a re-formation (Degradation) problem occurs, i.e., the accuracy rate increases first and then reaches saturation, and then the accuracy rate decreases as the depth is increased. Based on the above description, it can be known that when the number of layers of the neural network reaches a certain number, the performance of the network is saturated, and the performance of the network starts to degrade when the number of layers of the neural network increases, but the degradation is not caused by overfitting because the training precision and the testing precision are reduced, which indicates that the neural network is difficult to train when the network reaches a certain depth. And the depth residual layer is used for solving the problem of performance degradation of the network depth after the network depth is deepened.
According to the method, the depth residual error layer is adopted to carry out depth residual error optimization on the processed local feature mapping image, and the gradient dispersion problem caused by the fact that the number of the deconvolution neural network layers is too deep is solved due to the fact that a residual error network structure is introduced into the depth residual error layer, and the structuring accuracy of the three-dimensional point cloud is improved.
In one embodiment, the deconvolution neural network comprises deconvolution layers and depth residual layers which are alternately arranged in a cascade manner, wherein the first deconvolution layer takes an up-sampled image obtained by up-sampling a central point as an input image, and the subsequent deconvolution layers take a local feature mapping image subjected to depth residual optimization by the previous depth residual layer as an input image.
Specifically, the deconvolution neural network includes deconvolution layers and depth residual layers arranged in an alternating cascade. The plurality of deconvolution layers may include a first deconvolution layer, a second deconvolution layer, a third deconvolution layer, and a fourth deconvolution layer, and the plurality of depth residual layers may include a first depth residual layer, a second depth residual layer, a third depth residual layer, and a fourth depth residual layer.
Each local feature vector is up-sampled to obtain an up-sampled image, and the up-sampled images correspond to the local feature vectors one to one. The method includes inputting an upsampled image into a first deconvolution layer, performing deconvolution mapping processing on the upsampled image by the first deconvolution layer, and outputting a first local feature map, wherein the size of the first local feature map is twice that of the upsampled image. The first depth residual layer performs depth feature extraction on the first local feature map. And inputting the first local feature map after the depth feature extraction into a second deconvolution layer, performing deconvolution mapping processing on the first local feature map through the second deconvolution layer, and outputting a second local feature map, wherein the size of the second local feature map is twice that of the first local feature map. And the second depth residual layer is used for carrying out depth feature extraction on the second local feature map. And inputting the second local feature map after the depth feature extraction into a third deconvolution layer, performing deconvolution mapping processing on the second local feature map through the third deconvolution layer, and outputting a third local feature map, wherein the size of the third local feature map is twice that of the second local feature map. And the third depth residual layer is used for carrying out depth feature extraction on the second local feature map. And inputting the third local feature map after the depth feature extraction into a fourth deconvolution layer, performing deconvolution mapping processing on the third local feature map through the fourth deconvolution layer, and outputting a fourth local feature map, wherein the size of the fourth local feature map is twice that of the third local feature map. And the fourth depth residual layer performs depth feature extraction on the fourth local feature map, wherein the local feature mapping map is the fourth local feature map after the depth feature extraction.
The following is further described in detail in accordance with the above embodiments of the present application:
the original three-dimensional point cloud is subjected to feature extraction processing through a local feature extraction network, so that local feature vectors of a local area with a plurality of points of the three-dimensional point cloud as a central point are obtained, and the number of the local feature vectors can be 128. The local feature vector corresponding to each central point can be regarded as a feature map with the size of 1 × 1, for which, a single up-sampling structural unit is constructed, and the local feature vectors are up-sampled to obtain 128 up-sampled images with the size of 2 × 2. Inputting 128 up-sampled images of size 2 × 2 to a deconvolution neural network comprising four sets of deconvolution layers and four sets of depth residual layers in an alternating cascade arrangement:
1. outputting 128 first local feature maps with the size of 4 multiplied by 4 after passing through the first deconvolution layer, and performing depth feature extraction on the first local feature maps by using the first depth residual error layer;
2. outputting 128 second local feature maps with the size of 8 multiplied by 8 after passing through a second deconvolution layer, and performing depth feature extraction on the second local feature maps by using a second depth residual error layer;
3. outputting 128 third local feature maps with the size of 16 multiplied by 16 after passing through a third deconvolution layer, and performing depth feature extraction on the third local feature maps by using a third depth residual error layer;
4. and finally outputting 128 fourth local feature maps with the size of 32 multiplied by 32 after passing through a fourth deconvolution layer, and performing depth feature extraction on the fourth local feature maps by using a fourth depth residual error layer.
Referring to fig. 3, fig. 3 is a schematic flowchart of an embodiment of the classification method for three-dimensional point cloud based on deconvolution neural network according to the present application, and the method includes the following steps:
s31: the method for structuring the three-dimensional point cloud based on the deconvolution neural network is as described above.
Please specifically refer to the above embodiment of the method for structuring a three-dimensional point cloud based on a deconvolution neural network, which is not described herein again.
S32: and classifying the fusion image through a classification network so as to realize the classification of the three-dimensional point cloud.
Specifically, the fusion image obtained in step S13 is classified by the classification network,
the application provides a three-dimensional point cloud classification method based on a deconvolution neural network, which can overcome disorder, sparsity and limitation of original three-dimensional point clouds, realize effective classification of the three-dimensional point clouds based on the deconvolution neural network, and has high classification accuracy.
In an embodiment, referring to fig. 4, fig. 4 is a schematic flowchart of another embodiment of the method for classifying a three-dimensional point cloud based on a deconvolution neural network according to the present application. The classification network comprises at least one group of convolution layers arranged in a cascade mode, a batch standard layer, an activation function layer, a second maximum value pooling layer and a full contact layer connected with the second maximum value pooling layer of the last group.
Specifically, the classification network may include a first convolution layer, a second convolution layer, a third convolution layer, and a fourth convolution layer, which are arranged in a cascade manner, wherein a batch standard layer, an activation function layer, and a second maximum value pooling layer are sequentially connected to each convolution layer.
Step S32 includes the following steps:
s41: and the convolution layer performs two-dimensional convolution operation on the fusion image or the fusion mapping image output by the second maximum pooling layer of the previous group as an input image so as to extract the characteristic data.
Wherein the convolutional layer is used for executing two-dimensional convolution operation, and the two-dimensional convolution operation comprises: and carrying out counterpoint multiplication operation on the convolution kernel matrix data and the sub-matrix data of the convolution kernel matrix data at the current position to obtain a plurality of elements, and carrying out accumulation summation operation on the plurality of elements to obtain a convolution result of the current position. That is to say, in the embodiment of the present application, the submatrix operation unit performs convolution operation by using a bit-by-bit multiplication and summation method.
S42: and the batch standard layer is used for carrying out standardization processing on the characteristic data.
And the batch standard layer is used for carrying out batch standardized processing on the image data of the features after the two-dimensional convolution operation and accelerating network training.
S43: and the activation function layer performs linear activation on the normalized feature data, wherein a parameter correction linear unit ReLu is selected as an activation function.
The activation function layer is used for activating the characteristics after batch standardization, purposefully expressing useful picture characteristic information, and can select a nonlinear ReLu function as the activation function to train data after the convolutional layer. By means of a given ReLu filter, useful information larger than a certain threshold is activated, and useful information smaller than the threshold is suppressed, and the activation formula of the ReLu filter can be activec max (0, Converc). The Active is the corresponding coordinate characteristic data after activation, the coordinate data matrix after final activation is marked as Active, max (0, convert) is the activation function, namely the value in the matrix is filtered by taking the threshold value as 0, and the maximum value of the current value and the threshold value is taken, so that the characteristics of stimulation and inhibition of the human body mechanism signals are better met. And performing pooling dimension reduction operation on the activated Active data matrix, so that the feature calculation efficiency is improved, and the maximum pooling operation is adopted in the pooling dimension reduction operation.
S44: the second max pooling max-pooling processes the activation data to form a fused map image.
The second maximum pooling may compute a maximum for feature values extracted from the activation data and integrate the maximum to obtain a fused mapping image. The maximum value pooling process can reserve the maximum value in the activation data, reduce the deviation of the estimated mean value caused by parameter errors of the convolutional layer and reserve more characteristic information. Meanwhile, the second maximum pooling can reduce the scale of the fusion mapping image, simplify the complexity of calculation and reduce the phenomenon of overfitting to a certain extent. In other embodiments, the maximum pooling layer may be replaced with an average pooling layer and a random pooling layer.
S45: and the full connection layer classifies the fusion mapping images output by the last group of second maximum pooling layers and outputs a classification result.
The fusion mapping image output by the last group of second maximum pooling layers can be classified by using the full-link layer, so that a classification prediction result is obtained.
Performing maximum pooling processing on the activation data based on a second maximum pooling function to obtain a fusion mapping image after dimension reduction; and inputting the fused mapping image subjected to the dimension reduction into a full Connected layers (FC) for classification processing, and generating a classification prediction result.
Based on the above embodiment, the method for classifying the three-dimensional point cloud based on the deconvolution neural network can obtain better classification performance. Therefore, the classification performance of the classification method of the three-dimensional point cloud is tested. In the test experiment, four slices with different local feature vectors are respectively selected: 128. 256, 512, 1024, the experimental results are shown in fig. 13.
As can be seen from fig. 13, in the method for classifying a three-dimensional point cloud based on a deconvolution neural network, the number of local feature vectors in the original three-dimensional point cloud is reduced to 128, and the 128 local feature vectors still can retain the main structure of the original three-dimensional point cloud, so that the three-dimensional point cloud can be accurately classified. As can be seen from fig. 11, when the number of local feature vectors in the original three-dimensional point cloud is 128, 256, 512, 1024, the standard dataset of the original three-dimensional point cloud classification includes 12311 three-dimensional point cloud objects from 40 classes, and the 12311 three-dimensional point cloud object is divided into two parts: 9843 training samples and 2468 testing samples, the classification accuracy of the classification method of the three-dimensional point cloud based on the deconvolution neural network is as follows:
when the number of the local feature vectors is 128, the classification accuracy is 87.6%;
when the number of the local feature vectors is 256, the classification accuracy is 88.2%;
when the number of the local feature vectors is 512, the classification accuracy is 88.4%;
when the number of the local feature vectors is 1024, the classification accuracy is 88.7%;
when the number of the local feature vectors is 2048, the classification accuracy is 89.9%.
The application provides a three-dimensional point cloud classification method based on a deconvolution neural network, which can overcome disorder, sparsity and limitation of original three-dimensional point clouds, realize effective classification of the three-dimensional point clouds based on the deconvolution neural network, and has high classification accuracy.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an embodiment of the image classification device based on the deconvolution neural network according to the present application, where the image classification device 50 includes a memory 51 and a processor 52, and the processor 52 is coupled to the memory 51.
The processor 52 is operative to implement, in cooperation with the memory 51, an image classification method based on a deconvolution neural network as previously described.
Wherein, the processor 52 is used for realizing the classification method of the three-dimensional point cloud based on the deconvolution neural network in cooperation with the memory 51 during working; the processor 52 is configured to perform a classification process on the fused image through a classification network to realize the classification of the three-dimensional point cloud.
The classification network comprises at least one group of cascade-arranged convolution layers, a batch standard layer, an activation function layer, a second maximum value pooling layer and a full contact layer connected with the second maximum value pooling layer of the last group.
The processor 52 is configured to perform a two-dimensional convolution operation on the fused image or the fused mapped image output via the second maximum pooling layer of the previous group as an input image through the convolution layer to extract feature data.
The processor 52 is configured to normalize the feature data via the batch normalization layer.
The processor 52 is configured to perform linear activation on the normalized feature data through an activation function layer, wherein the parameter modification linear unit ReLu is selected as an activation function.
Processor 52 is operative to maximum pooling the activation data via a second maximum pooling to form a fused map image.
The processor 52 is configured to perform classification processing on the fused mapping image output by the last group of the second maximum pooling layer through the full-connected layer, and output a classification result.
The application provides an image classification device 50 based on a deconvolution neural network, which can overcome the disorder, sparsity and limitation of original three-dimensional point clouds, realize effective classification of the three-dimensional point clouds based on the deconvolution neural network, and has higher classification accuracy.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an embodiment of the apparatus with storage function 60 of the present application, in which the apparatus 60 stores program data 61, and when the program data 61 is executed, the method for structuring a three-dimensional point cloud based on a deconvolution neural network as in the above-mentioned embodiment can be implemented, or when the program data 61 is executed, the method for classifying a three-dimensional point cloud based on a deconvolution neural network as in the above-mentioned embodiment can be implemented.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a" does not exclude the presence of other similar elements in a process, method, article, or apparatus that comprises the element.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program data can be stored in a computer readable storage medium, and the program data executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: ROM, RAM, magnetic or optical disks, and the like.
The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (10)

1. A method for structuring a three-dimensional point cloud based on a deconvolution neural network, the method comprising:
carrying out feature extraction processing on the original three-dimensional point cloud through a local feature extraction network so as to obtain a local feature vector of a local area taking a plurality of points of the three-dimensional point cloud as a central point;
performing deconvolution mapping processing on the local feature vector corresponding to each central point through the deconvolution neural network to obtain a plurality of local feature mapping maps, wherein the local feature mapping maps correspond to the local feature vectors one by one;
and performing maximum pooling on the local feature maps through a first maximum pooling layer to obtain a fused image.
2. The method of claim 1, wherein the local feature extraction network comprises at least one aggregate abstraction layer, wherein the aggregate abstraction layer comprises a sampling layer, a grouping layer, and a pointnet layer;
the step of carrying out feature extraction processing on the original three-dimensional point cloud through the local feature extraction network comprises the following steps:
the sampling layer takes the original three-dimensional point cloud or a central point set output by the set abstraction layer at the previous stage as an input point cloud, and a first point cloud subset is sampled from the input point cloud by utilizing a farthest point sampling algorithm;
the grouping layer searches for adjacent area points taking the points in the first point cloud subset as the central point from the input point cloud to form a plurality of second point cloud subsets;
and the dot net layer respectively extracts the features of the second point cloud subsets to obtain local feature vectors corresponding to the central points.
3. The method of claim 2, wherein the local feature extraction network comprises at least two collection abstraction layers arranged in cascade, wherein a first collection abstraction layer uses the original three-dimensional point cloud as the input point cloud, and a subsequent collection abstraction layer uses the set of the central points processed by the previous collection abstraction layer as the input point cloud.
4. The method of claim 1, wherein the deconvolution neural network comprises at least one deconvolution layer;
the step of performing deconvolution mapping processing on the local feature vector corresponding to each central point through the deconvolution neural network includes:
and the deconvolution layer takes an up-sampling image obtained by up-sampling the central point or a local feature mapping image output by the previous deconvolution layer as an input image, performs deconvolution mapping processing on the input image, and outputs the processed local feature mapping image.
5. The method of claim 4, wherein the deconvolution neural network includes at least one depth residual layer;
the step of performing deconvolution mapping processing on the local feature vector corresponding to each central point through the deconvolution neural network further includes:
and performing depth residual optimization on the processed local feature mapping image through the depth residual layer.
6. The method of claim 5, wherein the deconvolution neural network comprises the deconvolution layers and depth residual layers arranged in an alternating cascade, wherein a first one of the deconvolution layers uses an upsampled image obtained by upsampling the central point as the input image, and a subsequent one of the deconvolution layers uses the local feature map subjected to depth residual optimization by the previous depth residual layer as the input image.
7. A method for classifying a three-dimensional point cloud based on a deconvolution neural network, the method comprising the method for structuring a three-dimensional point cloud based on a deconvolution neural network according to any one of claims 1 to 6; and
and classifying the fused image through a classification network so as to realize the classification of the three-dimensional point cloud.
8. The method of classifying a three-dimensional point cloud based on a deconvolution neural network of claim 7, wherein the classification network comprises at least one set of a convolutional layer, a batch standard layer, an activation function layer and a second maximum pooling layer arranged in cascade, and a full contact layer connected to the second maximum pooling layer of the last set;
the step of classifying the fused image through a classification network comprises:
the convolution layer carries out two-dimensional convolution operation on the fused image or the fused mapping image output by the second maximum pooling layer of the previous group as an input image so as to extract characteristic data;
the batch standard layer is used for carrying out standardization processing on the characteristic data;
the activation function layer is used for carrying out linear activation on the normalized feature data, wherein a parameter correction linear unit ReLu is selected as an activation function;
the second max pooling maximum pooling processing the activation data to form the fused map image;
and the full connection layer classifies the fusion mapping image output by the second maximum pooling layer of the last group and outputs a classification result.
9. An image classification device based on a deconvolution neural network, the device comprising a memory, a processor, the processor coupled to the memory;
the processor, in cooperation with the memory, is operative to implement the method for classifying a three-dimensional point cloud based on a deconvolution neural network of any one of claims 7-8.
10. An apparatus having a storage function, characterized in that the apparatus stores program data which, when executed, is capable of implementing the method of structuring a three-dimensional point cloud based on a deconvolution neural network according to any one of claims 1 to 6, or which, when executed, is capable of implementing the method of classifying a three-dimensional point cloud based on a deconvolution neural network according to any one of claims 7 to 8.
CN201910960562.3A 2019-10-10 2019-10-10 Three-dimensional point cloud structuring method, classification method, equipment and device Pending CN110956194A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910960562.3A CN110956194A (en) 2019-10-10 2019-10-10 Three-dimensional point cloud structuring method, classification method, equipment and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910960562.3A CN110956194A (en) 2019-10-10 2019-10-10 Three-dimensional point cloud structuring method, classification method, equipment and device

Publications (1)

Publication Number Publication Date
CN110956194A true CN110956194A (en) 2020-04-03

Family

ID=69976351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910960562.3A Pending CN110956194A (en) 2019-10-10 2019-10-10 Three-dimensional point cloud structuring method, classification method, equipment and device

Country Status (1)

Country Link
CN (1) CN110956194A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860668A (en) * 2020-07-27 2020-10-30 辽宁工程技术大学 Point cloud identification method of deep convolution network for original 3D point cloud processing

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130022241A1 (en) * 2011-07-22 2013-01-24 Raytheon Company Enhancing gmapd ladar images using 3-d wallis statistical differencing
CN108416318A (en) * 2018-03-22 2018-08-17 电子科技大学 Diameter radar image target depth method of model identification based on data enhancing
CN108717569A (en) * 2018-05-16 2018-10-30 中国人民解放军陆军工程大学 It is a kind of to expand full convolutional neural networks and its construction method
CN109389671A (en) * 2018-09-25 2019-02-26 南京大学 A kind of single image three-dimensional rebuilding method based on multistage neural network
CN109410321A (en) * 2018-10-17 2019-03-01 大连理工大学 Three-dimensional rebuilding method based on convolutional neural networks
WO2019060125A1 (en) * 2017-09-22 2019-03-28 Zoox, Inc. Three-dimensional bounding box from two-dimensional image and point cloud data
US20190122378A1 (en) * 2017-04-17 2019-04-25 The United States Of America, As Represented By The Secretary Of The Navy Apparatuses and methods for machine vision systems including creation of a point cloud model and/or three dimensional model based on multiple images from different perspectives and combination of depth cues from camera motion and defocus with various applications including navigation systems, and pattern matching systems as well as estimating relative blur between images for use in depth from defocus or autofocusing applications

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130022241A1 (en) * 2011-07-22 2013-01-24 Raytheon Company Enhancing gmapd ladar images using 3-d wallis statistical differencing
US20190122378A1 (en) * 2017-04-17 2019-04-25 The United States Of America, As Represented By The Secretary Of The Navy Apparatuses and methods for machine vision systems including creation of a point cloud model and/or three dimensional model based on multiple images from different perspectives and combination of depth cues from camera motion and defocus with various applications including navigation systems, and pattern matching systems as well as estimating relative blur between images for use in depth from defocus or autofocusing applications
WO2019060125A1 (en) * 2017-09-22 2019-03-28 Zoox, Inc. Three-dimensional bounding box from two-dimensional image and point cloud data
CN108416318A (en) * 2018-03-22 2018-08-17 电子科技大学 Diameter radar image target depth method of model identification based on data enhancing
CN108717569A (en) * 2018-05-16 2018-10-30 中国人民解放军陆军工程大学 It is a kind of to expand full convolutional neural networks and its construction method
CN109389671A (en) * 2018-09-25 2019-02-26 南京大学 A kind of single image three-dimensional rebuilding method based on multistage neural network
CN109410321A (en) * 2018-10-17 2019-03-01 大连理工大学 Three-dimensional rebuilding method based on convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FAN CHEN: "PTINet: Converting 3D Points to 2D Images with Deconvolution for Point Cloud Classification" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860668A (en) * 2020-07-27 2020-10-30 辽宁工程技术大学 Point cloud identification method of deep convolution network for original 3D point cloud processing
CN111860668B (en) * 2020-07-27 2024-04-02 辽宁工程技术大学 Point cloud identification method for depth convolution network of original 3D point cloud processing

Similar Documents

Publication Publication Date Title
CN112329800B (en) Salient object detection method based on global information guiding residual attention
CN110232394B (en) Multi-scale image semantic segmentation method
Sporring et al. Information measures in scale-spaces
CN112396115B (en) Attention mechanism-based target detection method and device and computer equipment
Shih Image processing and pattern recognition: fundamentals and techniques
CN108475415B (en) Method and system for image processing
Thakur et al. Image de-noising with machine learning: A review
CN110991444B (en) License plate recognition method and device for complex scene
Liu et al. Recovery analysis for plug-and-play priors using the restricted eigenvalue condition
Khaw et al. High‐density impulse noise detection and removal using deep convolutional neural network with particle swarm optimisation
CN111091147B (en) Image classification method, device and equipment
CN111553867B (en) Image deblurring method and device, computer equipment and storage medium
CN112686225A (en) Training method of YOLO neural network, pedestrian detection method and related equipment
Guo et al. Agem: Solving linear inverse problems via deep priors and sampling
CN115293966A (en) Face image reconstruction method and device and storage medium
Hong et al. A novel evolutionary approach to image enhancement filter design: method and applications
CN110956194A (en) Three-dimensional point cloud structuring method, classification method, equipment and device
CN116563285B (en) Focus characteristic identifying and dividing method and system based on full neural network
Chierchia et al. Epigraphical splitting for solving constrained convex formulations of inverse problems with proximal tools
Guo et al. Image blind deblurring using an adaptive patch prior
CN116740399A (en) Training method, matching method and medium for heterogeneous image matching model
CN116188272A (en) Two-stage depth network image super-resolution reconstruction method suitable for multiple fuzzy cores
Swee et al. Deep convolutional neural network for sem image noise variance classification
Abbas et al. Face Recognition using DWT with HMM
WO2019243910A1 (en) Segmenting irregular shapes in images using deep region growing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination