CN117635488A - Light-weight point cloud completion method combining channel pruning and channel attention - Google Patents
Light-weight point cloud completion method combining channel pruning and channel attention Download PDFInfo
- Publication number
- CN117635488A CN117635488A CN202311604390.9A CN202311604390A CN117635488A CN 117635488 A CN117635488 A CN 117635488A CN 202311604390 A CN202311604390 A CN 202311604390A CN 117635488 A CN117635488 A CN 117635488A
- Authority
- CN
- China
- Prior art keywords
- point cloud
- decoder
- feature
- point
- channel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000013138 pruning Methods 0.000 title claims abstract description 28
- 239000013598 vector Substances 0.000 claims abstract description 33
- 238000000605 extraction Methods 0.000 claims abstract description 11
- 230000000295 complement effect Effects 0.000 claims description 37
- 239000011159 matrix material Substances 0.000 claims description 32
- 230000006870 function Effects 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 10
- 238000011176 pooling Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000000280 densification Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 238000003860 storage Methods 0.000 claims description 6
- 238000012935 Averaging Methods 0.000 claims description 4
- 230000003190 augmentative effect Effects 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 claims description 3
- 238000012512 characterization method Methods 0.000 claims description 3
- 238000006073 displacement reaction Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000006399 behavior Effects 0.000 claims description 2
- 239000002689 soil Substances 0.000 claims 1
- 238000002474 experimental method Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 10
- 238000009826 distribution Methods 0.000 description 7
- 230000001133 acceleration Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000007670 refining Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 208000028523 Hereditary Complement Deficiency disease Diseases 0.000 description 1
- 101150012648 Odam gene Proteins 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 201000002388 complement deficiency Diseases 0.000 description 1
- 238000003169 complementation method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a light-weight point cloud completion method combining channel pruning and channel attention, and belongs to the technical field of computer vision. The method aims at solving the problems of overlong neglect and reasoning time of the local information of the point cloud of the existing network. In order to improve the network reasoning efficiency, an efficient disposable channel pruning technology is adopted to improve the network completion efficiency; adding a channel attention module into the network at the feature extraction stage, splicing the weighted features and the global features, and extracting through two layers of multidimensional feature information to obtain a final feature vector; the feature vector is transmitted into a double decoder structure, and dense rough point cloud and input point cloud deviation values are generated through a full connection layer and a multi-layer perceptron respectively; and adding the rough point cloud and the input point cloud deviation value to obtain a final refined complete point cloud. Experiments are carried out on the PCN data set, and the experimental results show that the real-time performance of the vehicle information which is completely missing is obviously improved, and the complete accuracy is well represented.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a lightweight point cloud complement method combining channel pruning and channel attention.
Background
Along with the rapid development of three-dimensional sensors, point clouds are widely applied in the fields of automatic driving, augmented reality, robots and the like. However, due to weather, resolution, occlusion, and viewing angle limitations, sensors typically acquire sparse, incomplete, and noisy point clouds that can result in reduced accuracy in tasks such as object recognition, segmentation, and the like. For example, in autopilot, in order to detect obstacles and receive other relevant driving information, the computer vision system may receive and analyze the original point cloud from the sensor, but due to the incompleteness of the point cloud, particularly fragmentation of the vehicle point cloud, the accuracy of the object detection, traffic early warning and collision avoidance functions of the autopilot vehicle may be reduced. Therefore, recovering the complete shape from the local point cloud is very important for downstream tasks such as object recognition, segmentation, etc.
With the rapid development of deep learning in recent years, deep learning is increasingly widely used in 3D vision systems. Since PointNet and Point-Net++ have achieved great success in Point cloud processing, more deep learning-based approaches have been used to solve the three-dimensional Point cloud completion task. The PCN first proposes a coarse-to-fine point cloud completion framework that generates a coarse point cloud based on learning global features from partial inputs, and its decoder completes the point cloud completion based on the Folding-Net refinement of the coarse point cloud, but focuses only on the global features of the point cloud and ignores the local feature information. The PF-Net provides a brand new network architecture, adopts a multi-resolution encoder and a pyramid decoder to generate only incomplete part of point cloud data without changing original data, and adds an anti-loss function to make the completed model finer, but has higher requirements on point cloud density and distribution, and loses local geometric characteristics when downsampling is carried out on point cloud data with lower density or uneven distribution. The PMPNet converts the point cloud completion task into a point movement path searching problem for the first time, predicts the optimal movement path of each point according to the constraint of the total point movement distance, enables the points to fill the missing area as much as possible to form the final completion point cloud, but because the point cloud has the characteristics of sparsity, unstructured and the like, the PMPNet is difficult to learn the detail characteristics of the complete point cloud, and therefore high-quality completion results are difficult to generate. In order to solve the problems of PMPNet complement deficiency details and uneven point distribution, PMPNet++ introduces transform to capture shape context information, enhances point-by-point characteristics and greatly improves network complement performance, but in the process of moving and complementing point cloud data into complete point cloud, information of feature vectors is inevitably lost, so that geometric deformation is caused. In order to better solve the topology missing part, LAKe-Net provides a new topology perception point cloud complement model, and the missing point cloud is complemented by locally positioning key point alignment and adopting a 'key point-skeleton-shape' complement mode, which comprises three steps of alignment key point positioning, surface skeleton generation and shape refinement.
Although the existing point cloud completion networks based on deep learning have good performance in accuracy, a large number of redundant parameters exist, so that the reasoning efficiency of the network is greatly reduced, and the network is difficult to deploy into real application scenes such as point cloud repair of an automatic driving automobile.
Disclosure of Invention
Aiming at the problems of overlong neglect and reasoning time of the local information of the point cloud in the existing network, the invention provides a light-weight point cloud complement method combining channel pruning and channel attention.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
step 1, acquiring point cloud data and sampling and classifying the point cloud data;
step 2, constructing a light-weight point cloud complement network model combining channel pruning and channel attention;
and 3, inputting the point cloud data obtained in the step 1 into the model obtained in the step 2 to complete point cloud.
Further, the network model in the step 2 follows the encoder-decoder structure and increases the complement efficiency by using channel pruning globally;
in the encoder-decoder structure, an encoder embeds a one-dimensional channel attention module into an input point cloud on the basis of global feature extraction, and the characterization capability of local features is enhanced by adaptively adjusting the weight of the global features;
in the encoder-decoder structure, a decoder adopts a double-decoder structure of a semantic decoder and a structure refinement decoder, the semantic decoder generates a complete dense rough point cloud, the structure refinement decoder generates a sub-feature by sharing a multi-layer perceptron with a feature vector v and fuses the sub-feature with the feature vector v to finally output the offset of each point in the original residual point cloud; and adding the offset and the generated dense rough point cloud to generate a final refined point cloud.
Further, in the step 2, the global use of the network is based on L 1 Disposable channel pruning of norms, L 1 The norm represents the sum of absolute values of non-zero elements in the vector x, and its optimized solution is a sparse solution, thus L 1 The norms are also called sparse rule operators, and are specifically defined as follows:
wherein, through L 1 The sparse characteristic is realized, the channels with small contribution degree are deleted, only the channels with large contribution degree are reserved, and finally, the pruned new convolution is obtained, so that parameters are reduced, the precision loss is reduced, and the completion efficiency is greatly improved.
Further, the one-dimensional channel attention module in the step 2 specifically includes: the input point cloud is expressed as m multiplied by 3 matrix P, P is input into the attention module, a feature map is generated to strengthen the weight of the channel, wherein m represents the number of the input point cloud, 3 represents the x, y and z coordinates of each point, and the specific definition is as follows:
ε=σ(CONV(P))⑵
wherein CONV represents one-dimensional convolution operation, sigma represents an activation function, values between 0 and 1 are obtained, the importance degree of different channels is represented, and then corresponding weights are distributed to the channels, and epsilon represents the importance of each characteristic channel;
multiplying the result epsilon with a matrix P to obtain a final output E, wherein the final output E is specifically defined as follows:
E=φ(P) ⑶
wherein E represents a feature matrix after feature mapping, and phi (°) represents feature mapping; thus, different characteristic channels are given different weights, wherein one-dimensional convolution also helps to learn the relationship between different characteristic channels more efficiently through nonlinearity, and can reduce the number of parameters and the overfitting situation.
Further, the encoder in step 2 performs feature extraction on the input point cloud by using a one-dimensional channel attention module, and the specific process is as follows: embedding a one-dimensional channel attention module in the two laminated PointNet layers to extract the geometric information of the input point cloud; each PointNet layer comprises a shared multi-layer perceptron and a maximum pooling layer as basic modules;
(1) In the first PointNet layer, the matrix P learns each point feature P by sharing the multi-layer perceptron i The feature matrix F, F is composed of the point features, and each behavior of the feature matrix F is learned i Multiplying F by a feature matrix E with the same size obtained by the one-dimensional channel attention module point by point;
(2) Finally obtaining 256-dimensional global features g through a point-by-point maximum pooling layer; in the second PointNet layer, global feature g and feature matrix are taken as inputs, g is taken as an input with each independent point feature P i Concatenating and expanding to generate an augmented point feature matrix
(3) Sharing the multi-layer perceptron and point-by-point max pooling by another similar to the first PointNet layer;
(4) Extracted 1024-dimensional global feature vector v epsilon R k Where k=1024.
Further, the decoder in step 2 uses a dual decoder structure to complement the input point cloud, and the specific complement process includes:
(1) The feature vector v is respectively transmitted into a semantic densification decoder and a structure refinement decoder;
(2) The semantic densification decoder generates a sparse point cloud with a complete geometric surface by using three full connection layers, outputs a final vector with 3N units, and reshapes the final vector into an N×3 rough point cloud P coarse The method comprises the steps of carrying out a first treatment on the surface of the Subsequently, P is coarse Tiling the points in (a) to produce a dense set of points P' coarse =rnx 3, where r is the upsampling rate; second, to fully exploit the characteristics of the input point cloud, the network will P' coarse Is connected to obtain new aggregated features by having a size of 512, 512,3]Shared multi-layer perception of (1)A machine generates a new rN multiplied by 3 matrix M'; the shared multi-layer perceptron may be viewed as a non-linear map that converts the 2D mesh into a smooth 2D manifold in 3D space; finally, by combining P' coarse The coordinates of each point in the matrix M 'are added to generate a dense point cloud P' dense =rN×3;
(3) The structure refinement decoder comprises a root node N for solving the problems of dense point cloud loss detail and uneven density distribution 0 For receiving the feature vector v and using M 1 M with dimension C generated by multiple layers of perceptrons 1 A feature vector corresponding to M of the first layer in the hierarchical structure 1 A child node; then, the feature vector of each i.gtoreq.1 level node is connected with the global feature v generated by the encoder and is formed by M i+1 Further processing by multiple layers of perceptrons to generate M for each node of the next level i+1 i+1 A sub-feature; all nodes on each layer are formed by M i Processing the same shared multi-layer perceptron; at the last layer of the tree structure, the feature vector dimension C=3 generated for each leaf node is used as the deviation value of the original point cloud and the dense point cloud P dense Adding to finally generate complete refinement point cloud P refined Specifically defined as follows:
P refined =R(v)+P dense ⑷
further, the loss function of the network is defined as the topological distance between the complement target and the true value, the chamfer distance CD and the earth moving distance EMD are used as two displacement invariants to be used as the comparative unordered point cloud, and CD is selected as the complement loss, so that the calculation efficiency is micro and higher than that of EMD, and the specific definition of CD between the complement point cloud and the real point cloud of the calculation output is as follows:
wherein d CD For chamfer distance, P c Complement point cloud for output, P gt For a real point cloud, x and y are P respectively c And P gt In (1), in the first itemCalculating P c Is mapped to P gt Averaging after the nearest Euclidean distance point in (2), calculating P in the second term gt Is mapped to P c Averaging after the points with the nearest Euclidean distance; p (P) c And P gt Is not required to be the same size.
Further, selecting the chamfer distance as a CD between the complement point cloud and the real point cloud of the complement loss calculation output, wherein the overall loss function of the network model is defined by the following formula:
Loss(P coarse ,P dense ,P gt )=d CD1 (P coarse ,P gt )+αd CD2 (P dense ,P gt )⑹
wherein d CD1 Representing a coarse point cloud P coarse And a true point cloud P gt D is the chamfer distance of CD2 Representing dense point cloud P dense And a true point cloud P gt The Loss value is inversely proportional to the complement performance.
The present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the method of the first aspect.
The present invention provides a computer device comprising a memory and a processor, on said memory a computer program capable of running on the processor is stored, said processor implementing the steps of the method of the first aspect when executing said computer program.
Compared with the prior art, the invention has the following advantages:
the method is suitable for real-time scenes such as automatic driving, the reasoning speed of the network is greatly improved through a one-time channel pruning algorithm, and in order to compensate the precision loss caused by channel pruning, a one-dimensional channel attention module is designed in a feature extraction stage, so that the one-dimensional channel attention module can learn and utilize the input point cloud features better, the overall network completion speed and precision are improved, and the point cloud completion can be widely applied to the real-time scenes.
Drawings
FIG. 1 is a schematic diagram of a lightweight point cloud completion method combining channel pruning and channel attention according to the present invention;
FIG. 2 is a schematic diagram of a one-dimensional channel attention module of the present invention;
FIG. 3 is a schematic diagram of a semantically dense decoder module of the present invention;
FIG. 4 is a schematic diagram of a structure refinement decoder module of the present invention;
fig. 5 is a point cloud completion result visualization of PCN datasets under different networks.
Detailed Description
The present invention will be described more fully hereinafter in order to facilitate an understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Example 1
A light-weight point cloud completion method combining channel pruning and channel attention, as shown in figure 1, comprises the following steps:
step 1: acquiring point cloud data, and sampling and classifying the point cloud data;
specifically, PCN datasets are used as datasets for the lightweight vehicle point cloud complementation method of the present invention that combines channel pruning and channel attention. The PCN dataset is a subset of 8 categories from which the shapen dataset is derived. A dataset containing partial and complete point cloud pairs was created using seven classes of synthetic CAD models in the PCN dataset to train the models, comprising a total of 27285 different model instances, with 100 models in each class for verification, 150 models for testing, and the remaining models reserved for training. To generate a complete, real point cloud, 16384 points are uniformly sampled for each class CAD model surface. The incomplete point cloud input does not use a subset of the complete point cloud, but instead presents the CAD model of the instance as a set of depth images of different perspectives, and then back projects these depth images to different view planes to generate an incomplete point cloud, which can make the distribution of the incomplete point cloud closer to the real world scan data and does not specify the size of the incomplete point cloud.
Step 2: constructing a lightweight point cloud complement network model (LVPC-Net) combining channel pruning and channel attention;
specifically, the network model follows the encoder-decoder architecture and increases the complement efficiency in global use of channel pruning; the encoder embeds a one-dimensional channel attention module into the input point cloud on the basis of global feature extraction, and enhances the characterization capability of local features by adaptively adjusting the weights of global features; the decoder adopts a double-decoder structure of a semantic decoder and a structure refinement decoder, the semantic decoder generates a complete dense rough point cloud, and the structure refinement decoder generates sub-features of a feature vector v through a shared multi-layer perceptron and fuses the offset of each point in the original residual point cloud; and finally, adding the offset to the generated dense coarse point cloud to generate a final refined point cloud.
The encoder extracts point cloud features using PCN as a backbone network.
L-based global use in a network 1 Disposable channel pruning of norms, L 1 The norm represents the sum of absolute values of non-zero elements in the vector x, and its optimized solution is a sparse solution, thus L 1 The norms are also called sparse rule operators, and are specifically defined as follows:
wherein, through L 1 The sparse characteristic can be realized, channels with small contribution degree are deleted, only channels with large contribution degree are reserved, and finally, the pruned new convolution is obtained, so that parameters are reduced, the precision loss is reduced, and the completion efficiency is greatly improved.
The one-dimensional channel attention module is specifically defined as: the input point cloud is represented as a matrix P of m×3, where m represents the number of input point clouds and 3 represents the x, y, z coordinates of each point. Subsequently, P is input into the attention module, and a feature map is generated to strengthen the weight of the channel, specifically defined as follows:
ε=σ(CONV(P))⑵
wherein CONV represents one-dimensional convolution operation, sigma is taken as an activation function, a numerical value between 0 and 1 is obtained to represent the importance degree of different channels, and then corresponding weights are distributed to the channels, and epsilon represents the importance of each characteristic channel.
Subsequently, the result ε is multiplied by the matrix P to obtain the final output E, which is specifically defined as follows:
E=φ(P) ⑶
wherein E represents the feature matrix after feature mapping, and phi (·) represents feature mapping. Thus, different characteristic channels are given different weights, wherein one-dimensional convolution also helps to learn the relationship between different characteristic channels more efficiently through nonlinearity, and can reduce the number of parameters and the overfitting situation.
The encoder utilizes a one-dimensional channel attention module to extract characteristics of the input point cloud, and the specific process comprises the following steps: and embedding a one-dimensional channel attention module into the two laminated PointNet layers to extract the geometric information of the input point cloud. Each PointNet layer includes a shared multi-layer perceptron and a max-pooling layer as basic modules. First, in the first PointNet layer, the matrix P learns each point feature P by sharing the multi-layer perceptron i The feature matrix F is composed of the point features, and each row is the learned point feature P i Multiplying F by a feature matrix E with the same size obtained by the one-dimensional channel attention module point by point; secondly, finally obtaining 256-dimensional global features g through a point-by-point maximum pooling layer; then, in the second PointNet layer, the global feature g and the feature matrix are taken as inputs, g is taken as an input with each independent point feature P i Connecting and expanding to generate an augmented point feature matrix F-; again, by another shared multi-layer perceptron similar to the first PointNet layer and point-by-point maximum pooling; finally, the 1024-dimensional global feature vector v epsilon R is extracted k Where k=1024.
The decoder uses a double decoder structure to complement the input point cloud, and the specific complement process comprises the following steps:
(1) The feature vector v is respectively transmitted into a semantic densification decoder and a structure refinement decoder;
(2) The semantic densification decoder uses three fully connected layers to generate a sparse point cloud with a complete geometric surface that outputs a final vector of 3N units and reshapes it into an N x 3 coarse point cloud P coarse The method comprises the steps of carrying out a first treatment on the surface of the Subsequently, P is coarse Tiling the points in (a) to produce a dense set of points P' coarse =rnx 3, where r is the upsampling rate; second, to fully exploit the characteristics of the input point cloud, the network will P' coarse Is connected to obtain new aggregated features by having a size of 512, 512,3]Generating a new rN x 3 matrix M', which can be seen as a non-linear map that converts the 2D mesh into a smooth 2D manifold in 3D space; finally, by combining P' coarse The coordinates of each point in the matrix M 'are added to generate a dense point cloud P' dense =rN×3;
(3) The structure refinement decoder comprises a root node N for solving the problems of dense point cloud loss detail and uneven density distribution 0 For receiving the feature vector v and using M 1 M with dimension C generated by multiple layers of perceptrons 1 A feature vector corresponding to M of the first layer in the hierarchical structure 1 And a child node. Then, the feature vector of each i.gtoreq.1 level node is connected with the global feature v generated by the encoder and is formed by M i+1 Further processing by multiple layers of perceptrons to generate M for each node of the next level i+1 i+1 Sub-features. All nodes on each layer are formed by M i The same shared multi-layer perceptron processes. At the last layer of the tree structure, the feature vector dimension C=3 generated for each leaf node is used as the deviation value of the original point cloud and the dense point cloud P dense Adding to finally generate complete refinement point cloud P refined Specifically defined as follows:
P refined =R(v)+P dense ⑷
and selecting the chamfer distance as the CD between the complement point cloud and the real point cloud which are output by the complement loss calculation, wherein the calculation process is expressed by the following formula:
wherein P is c Complement point cloud for output, P gt For a real point cloud, x and y are P respectively c And P gt In (1), calculate P in the first term c Is mapped to P gt The euclidean distance in (c) is averaged after the nearest point and vice versa. Thus, P c And P gt Is not required to be the same size.
The overall loss function of the network model is defined as follows:
Loss(P coarse ,P dense ,P gt )=d CD1 (P coarse ,P gt )+αd CD2 (P dense ,P gt ) ⑹
wherein d CD1 Representing a coarse point cloud P coarse And a true point cloud P gt D is the chamfer distance of CD2 Representing dense point cloud P dense And a true point cloud P gt The Loss value is inversely proportional to the complement performance.
Step 3: inputting the PCN data set obtained in the step 1 into the model obtained in the step 2 to carry out incomplete cloud completion;
specifically, the running environment is ubuntu18.0, 400 cycles of training are performed, an Adam optimizer with an initial learning rate of 0.0001 is used for training the network, the batch size is set to 32, the learning rate decays by 0.7 after each 50 iterations, and the results are obtained through multiple tests, see specifically tables 1, 2 and 3.
TABLE 1 Point cloud completion result Comparison (CD) for different networks
Verification method on PCN dataset, CD (10 3 ) F-score (%), single frame completion Time Time (ms), FPS and acceleration ratio Speedup are used as evaluation indexes. The more CD valueThe smaller the order of the Net predicted complement results is the closer to the shape of the real point cloud, it can be seen from Table 1 that the CD value of LVPC-Net is better than the current most point cloud complement network, and the average CD value of LVPC-Net is reduced by 8.31% compared with other networks (except PMPNet++) although the average CD value is 0.77 higher than the PMPNet++ with the best performance, the network performance of more than 90% can be achieved, and the Net is in the front in the complement network. Compared with other networks, the LVPC-Net improves the performance of the network in the characteristic extraction stage by adding the OdAM, so that important characteristics of the input point cloud are better utilized, and the effective prediction of the complement point cloud is ensured.
TABLE 2 Point cloud completion results for different networks (F-score)
The greater the F-score value, the higher the accuracy of the network predicted completion result, and the better the F-score value of LVPC-Net was compared to most point cloud completion networks, as can be seen from Table 2, the average improvement in the F-score value of LVPC-Net over other networks (except PMPNet++) was 2.14% lower than the optimal PMPNet++ by 2.19%, but also achieved 95% performance, still at an advanced level. The original deviation value of the input point cloud is reserved in the process of generating the dense point cloud by refinement, so that the network accuracy is improved, and great help is provided for supplementing the point cloud.
TABLE 3 Point cloud completion time vs (ms, speedup) for different networks
As can be seen from Table 3, the LVPC-Net complement time is far less than that of the existing various point cloud complement networks with precision as a center, the average acceleration ratio reaches 10.36, the acceleration ratio with PMPNet++ is highest, the acceleration ratio is 12.21, and tasks are always time-urgent in high-real-time scenes such as automatic driving. PCN, GRNet, PMPNet and PMPNet++ are generally less than 10FPS, which is difficult to meet the requirement of the real-time property of the automatic driving, and LVPC-Net is used for deleting some unimportant channels due to the disposable channel pruning algorithm, so that the network is lighter and more efficient, and the point cloud completion can be completed only by 15ms, namely, in the automatic driving process, the LVPC-Net is about 67FPS, which is enough to meet the requirement of the real-time property, and is completely in an acceptable range although the precision is reduced.
Fig. 5 is a visual comparison of the completed results under different networks, from which it can be seen that LVPC-Net can preserve fine detail in the completed results. The PCN only focuses on the global features of the point cloud in the feature extraction stage, so that the result of completion in detail still has a defect, and the situation that some points are unevenly distributed can be caused when the full-connection layer returns to a large number of points, such as the situation that the models of a desk lamp, a sofa and a table are completed, and a large number of noise points are formed. The GRNet's complement process uses information of two different data modes (i.e., point cloud and voxel), however, the GRNet's voxel representation is only used to reconstruct low-resolution shape, and for the structural model of automobile with smooth curved surface, the complement result will be uneven and have a large number of noise points. For PMPNet, since the point cloud is complemented by moving each point in the incomplete input, the gap of the complemented missing area is too large, the distribution of points is uneven, such as airplane and automobile models, and a large number of noise points appear even when the chair model is complemented. Pmpnet++ adds a transducer module on the basis of PMPNet to enhance the ability to learn point features, with the aim of predicting more accurate displacement of each point, but also has drawbacks in some detail, such as the absence of rearview mirrors in the automobile complement results. The method of the invention obtains relatively good complementing effect at the above position, which shows that LVPC-Net learns the key point characteristics of the input point cloud more effectively through the channel attention mechanism, thereby greatly helping the complementing result, and meanwhile, at the stage of refining the point cloud, a structure refining decoder is introduced to obtain the offset of the original input point cloud, so that the network can keep the detail characteristics of the input point cloud during complementing, and the complementing result is finer.
It is noted that embodiments of the present invention may be provided as methods, systems, and/or computer program products. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.
What is not described in detail in the present specification belongs to the prior art known to those skilled in the art. While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.
Claims (10)
1. The light-weight point cloud complement method combining channel pruning and channel attention is characterized by comprising the following steps of:
step 1, acquiring point cloud data and sampling and classifying the point cloud data;
step 2, constructing a light-weight point cloud complement network model combining channel pruning and channel attention;
and 3, inputting the point cloud data obtained in the step 1 into the model obtained in the step 2 to complete point cloud.
2. The method of claim 1, wherein the network model in step 2 follows the encoder-decoder structure and uses channel pruning globally to increase the efficiency of the complementation;
in the encoder-decoder, an encoder embeds a one-dimensional channel attention module into an input point cloud on the basis of global feature extraction, and the characterization capability of local features is enhanced by adaptively adjusting the weight of the global features;
in the encoder-decoder, the decoder adopts a double-decoder structure of a semantic decoder and a structure refinement decoder, the semantic decoder generates a complete dense rough point cloud, the structure refinement decoder generates sub-features by sharing a multi-layer perceptron with a feature vector v and fuses the sub-features to finally output the offset of each point in the original residual point cloud; and adding the offset and the generated dense rough point cloud to generate a final refined point cloud.
3. The method for light-weight point cloud completion combining channel pruning and channel attention as claimed in claim 2, wherein the encoder embeds a one-dimensional channel attention module on the basis of global feature extraction for the input point cloud, and uses an L-based algorithm on the network global basis 1 Disposable channel pruning of norms, L 1 The norm represents the sum of absolute values of non-zero elements in the vector x, and is specifically defined as follows:
wherein, through L 1 The sparse characteristic is realized, the channels with small contribution degree are deleted, only the channels with large contribution degree are reserved, and finally, the pruned new convolution is obtained, so that parameters are reduced, the precision loss is reduced, and the completion efficiency is greatly improved.
4. The method for light-weight point cloud completion by combining channel pruning and channel attention according to claim 2, wherein the one-dimensional channel attention module in the step 2 is specifically: the input point cloud is expressed as m multiplied by 3 matrix P, P is input into the attention module, a feature map is generated to strengthen the weight of the channel, wherein m represents the number of the input point cloud, 3 represents the x, y and z coordinates of each point, and the specific definition is as follows:
ε=σ(CONV(P)) (2)
wherein CONV represents one-dimensional convolution operation, sigma represents an activation function, values between 0 and 1 are obtained, the importance degree of different channels is represented, and then corresponding weights are distributed to the channels, and epsilon represents the importance of each characteristic channel;
multiplying the result epsilon with a matrix P to obtain a final output E, wherein the final output E is specifically defined as follows:
E=φ(P) (3)
wherein E represents the feature matrix after feature mapping, and phi (·) represents feature mapping.
5. The light-weight point cloud completion method combining channel pruning and channel attention as set forth in claim 2, wherein the encoder embeds a one-dimensional channel attention module into the input point cloud on the basis of global feature extraction, and the encoder performs feature extraction on the input point cloud by using the one-dimensional channel attention module, and the specific process is as follows: embedding a one-dimensional channel attention module in the two laminated PointNet layers to extract the geometric information of the input point cloud; each PointNet layer comprises a shared multi-layer perceptron and a maximum pooling layer as basic modules;
(1) In the first PointNet layer, the matrix P learns each point feature P by sharing the multi-layer perceptron i The feature matrix F, F is composed of the point features, and each behavior of the feature matrix F is learned i Multiplying F by a feature matrix E with the same size obtained by the one-dimensional channel attention module point by point;
(2) Finally obtaining 256-dimensional global features g through a point-by-point maximum pooling layer; in the second PointNet layer, global feature g and feature matrix are taken as inputs, g is taken as an input with each independent point feature P i Concatenating and expanding to generate an augmented point feature matrix
(3) Sharing the multi-layer perceptron and point-by-point max pooling by another similar to the first PointNet layer;
(4) Extracted 1024-dimensional global feature vector v epsilon R k Where k=1024.
6. The light-weight point cloud completion method combining channel pruning and channel attention as claimed in claim 2, wherein the decoder adopts a double decoder structure of a semantic decoder and a structure refinement decoder, the decoder uses the double decoder structure to complete the input point cloud, and the specific completion process comprises the following steps:
(1) The feature vector v is respectively transmitted into a semantic densification decoder and a structure refinement decoder;
(2) The semantic densification decoder generates a sparse point cloud with a complete geometric surface by using three full connection layers, outputs a final vector with 3N units, and reshapes the final vector into an N×3 rough point cloud P coarse The method comprises the steps of carrying out a first treatment on the surface of the Subsequently, P is coarse Tiling the points in (a) to produce a dense set of points P' coarse =rnx 3, where r is the upsampling rate; second, to fully exploit the characteristics of the input point cloud, the network will P' coarse Is connected to obtain new aggregated features by having a size of 512, 512,3]Generating a new rN x 3 matrix M'; finally, by combining P' coarse The coordinates of each point in the matrix M 'are added to generate a dense point cloud P' dense =rN×3;
(3) The structure refinement decoder comprises a root node N 0 For receiving the feature vector v and using M 1 M with dimension C generated by multiple layers of perceptrons 1 A feature vector corresponding to M of the first layer in the hierarchical structure 1 A child node; then, the feature vector of each i.gtoreq.1 level node is connected with the global feature v generated by the encoder and is formed by M i+1 Further processing by multiple layers of perceptrons to generate M for each node of the next level i+1 i+1 A sub-feature; all nodes on each layer are formed by M i Processing the same shared multi-layer perceptron; at the last layer of the tree structure, the feature vector dimension C=3 generated for each leaf node is used as the deviation value of the original point cloud and the dense point cloud P dense Adding to finally generate complete refinement point cloud P refined Specifically defined as follows:
P refined =R(v)+P dense (4)
7. the method for light-weight point cloud completion combining channel pruning and channel attention according to claim 2, wherein a loss function of a network is defined as a topological distance between a completion target and a true value, a chamfer distance CD and a soil moving distance EMD are used as two displacement invariants to be used as a relatively unordered point cloud, CD is selected as the completion loss, and the CD between the outputted completion point cloud and a real point cloud is calculated as shown in the following formula:
wherein d CD For chamfer distance, P c Complement point cloud for output, P gt For a real point cloud, x and y are P respectively c And P gt In (1), calculate P in the first term c Is mapped to P gt Averaging after the nearest Euclidean distance point in (2), calculating P in the second term gt Is mapped to P c Averaging after the points with the nearest Euclidean distance; p (P) c And P gt Is not required to be the same size.
8. The method for light-weight point cloud completion combining channel pruning and channel attention as claimed in claim 7, wherein the selecting the chamfer distance is used as a CD between the full point cloud and the real point cloud of the full loss calculation output, and wherein the overall loss function of the network model is defined by the following formula:
Loss(P coarse ,P dense ,P gt )=d CD1 (P coarse ,P gt )+αd CD2 (P dense ,P gt ) (6)
wherein d CD1 Representing a coarse point cloud P coarse And a true point cloud P gt D is the chamfer distance of CD2 Representing dense point cloud P dense And a true point cloud P gt The Loss value is inversely proportional to the complement performance.
9. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program implementing the steps of the method of any one of claims 1 to 8 when executed by a processor.
10. A computer device comprising a memory and a processor, on which memory a computer program is stored that can be run on the processor, characterized in that: the processor, when executing the computer program, implements the steps of the method of any of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311604390.9A CN117635488A (en) | 2023-11-28 | 2023-11-28 | Light-weight point cloud completion method combining channel pruning and channel attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311604390.9A CN117635488A (en) | 2023-11-28 | 2023-11-28 | Light-weight point cloud completion method combining channel pruning and channel attention |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117635488A true CN117635488A (en) | 2024-03-01 |
Family
ID=90026405
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311604390.9A Pending CN117635488A (en) | 2023-11-28 | 2023-11-28 | Light-weight point cloud completion method combining channel pruning and channel attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117635488A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118411389A (en) * | 2024-07-02 | 2024-07-30 | 厦门市礼小签电子科技有限公司 | Virtual space entity behavior prediction method based on point cloud identification |
-
2023
- 2023-11-28 CN CN202311604390.9A patent/CN117635488A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118411389A (en) * | 2024-07-02 | 2024-07-30 | 厦门市礼小签电子科技有限公司 | Virtual space entity behavior prediction method based on point cloud identification |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Fei et al. | Comprehensive review of deep learning-based 3d point cloud completion processing and analysis | |
CN111047548B (en) | Attitude transformation data processing method and device, computer equipment and storage medium | |
Liu et al. | FG-Net: Fast large-scale LiDAR point clouds understanding network leveraging correlated feature mining and geometric-aware modelling | |
CN113850270B (en) | Semantic scene completion method and system based on point cloud-voxel aggregation network model | |
Chen et al. | 3d point cloud processing and learning for autonomous driving | |
CN113838109B (en) | Low-coincidence point cloud registration method | |
Li et al. | ADR-MVSNet: A cascade network for 3D point cloud reconstruction with pixel occlusion | |
CN113792641A (en) | High-resolution lightweight human body posture estimation method combined with multispectral attention mechanism | |
CN112949647A (en) | Three-dimensional scene description method and device, electronic equipment and storage medium | |
CN114445265A (en) | Equal-rectangular projection stereo matching two-stage depth estimation machine learning algorithm and spherical distortion layer | |
CN117635488A (en) | Light-weight point cloud completion method combining channel pruning and channel attention | |
CN114549537A (en) | Unstructured environment point cloud semantic segmentation method based on cross-modal semantic enhancement | |
CN114067075A (en) | Point cloud completion method and device based on generation of countermeasure network | |
CN114445479A (en) | Equal-rectangular projection stereo matching two-stage depth estimation machine learning algorithm and spherical distortion layer | |
CN115131245A (en) | Point cloud completion method based on attention mechanism | |
CN118071932A (en) | Three-dimensional static scene image reconstruction method and system | |
US11308699B2 (en) | Method and system for data generation | |
Alhamazani et al. | 3DCascade-GAN: Shape completion from single-view depth images | |
CN112950786A (en) | Vehicle three-dimensional reconstruction method based on neural network | |
Tan et al. | 3D detection transformer: Set prediction of objects using point clouds | |
US20240087293A1 (en) | Extracting features from sensor data | |
CN111126310B (en) | Pedestrian gender identification method based on scene migration | |
Elharrouss et al. | 3d objects and scenes classification, recognition, segmentation, and reconstruction using 3d point cloud data: A review | |
Song | [Retracted] 3D Virtual Reality Implementation of Tourist Attractions Based on the Deep Belief Neural Network | |
EP4260243A1 (en) | Extracting features from sensor data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |