CN117635488A

CN117635488A - Light-weight point cloud completion method combining channel pruning and channel attention

Info

Publication number: CN117635488A
Application number: CN202311604390.9A
Authority: CN
Inventors: 杨晓文; 庞敏; 冯泊栋; 韩慧妍; 张元�; 韩燮; 贾彩琴; 焦世超; 赵融
Original assignee: North University of China
Current assignee: North University of China
Priority date: 2023-11-28
Filing date: 2023-11-28
Publication date: 2024-03-01

Abstract

The invention discloses a light-weight point cloud completion method combining channel pruning and channel attention, and belongs to the technical field of computer vision. The method aims at solving the problems of overlong neglect and reasoning time of the local information of the point cloud of the existing network. In order to improve the network reasoning efficiency, an efficient disposable channel pruning technology is adopted to improve the network completion efficiency; adding a channel attention module into the network at the feature extraction stage, splicing the weighted features and the global features, and extracting through two layers of multidimensional feature information to obtain a final feature vector; the feature vector is transmitted into a double decoder structure, and dense rough point cloud and input point cloud deviation values are generated through a full connection layer and a multi-layer perceptron respectively; and adding the rough point cloud and the input point cloud deviation value to obtain a final refined complete point cloud. Experiments are carried out on the PCN data set, and the experimental results show that the real-time performance of the vehicle information which is completely missing is obviously improved, and the complete accuracy is well represented.

Description

Light-weight point cloud completion method combining channel pruning and channel attention

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a lightweight point cloud complement method combining channel pruning and channel attention.

Background

Along with the rapid development of three-dimensional sensors, point clouds are widely applied in the fields of automatic driving, augmented reality, robots and the like. However, due to weather, resolution, occlusion, and viewing angle limitations, sensors typically acquire sparse, incomplete, and noisy point clouds that can result in reduced accuracy in tasks such as object recognition, segmentation, and the like. For example, in autopilot, in order to detect obstacles and receive other relevant driving information, the computer vision system may receive and analyze the original point cloud from the sensor, but due to the incompleteness of the point cloud, particularly fragmentation of the vehicle point cloud, the accuracy of the object detection, traffic early warning and collision avoidance functions of the autopilot vehicle may be reduced. Therefore, recovering the complete shape from the local point cloud is very important for downstream tasks such as object recognition, segmentation, etc.

With the rapid development of deep learning in recent years, deep learning is increasingly widely used in 3D vision systems. Since PointNet and Point-Net++ have achieved great success in Point cloud processing, more deep learning-based approaches have been used to solve the three-dimensional Point cloud completion task. The PCN first proposes a coarse-to-fine point cloud completion framework that generates a coarse point cloud based on learning global features from partial inputs, and its decoder completes the point cloud completion based on the Folding-Net refinement of the coarse point cloud, but focuses only on the global features of the point cloud and ignores the local feature information. The PF-Net provides a brand new network architecture, adopts a multi-resolution encoder and a pyramid decoder to generate only incomplete part of point cloud data without changing original data, and adds an anti-loss function to make the completed model finer, but has higher requirements on point cloud density and distribution, and loses local geometric characteristics when downsampling is carried out on point cloud data with lower density or uneven distribution. The PMPNet converts the point cloud completion task into a point movement path searching problem for the first time, predicts the optimal movement path of each point according to the constraint of the total point movement distance, enables the points to fill the missing area as much as possible to form the final completion point cloud, but because the point cloud has the characteristics of sparsity, unstructured and the like, the PMPNet is difficult to learn the detail characteristics of the complete point cloud, and therefore high-quality completion results are difficult to generate. In order to solve the problems of PMPNet complement deficiency details and uneven point distribution, PMPNet++ introduces transform to capture shape context information, enhances point-by-point characteristics and greatly improves network complement performance, but in the process of moving and complementing point cloud data into complete point cloud, information of feature vectors is inevitably lost, so that geometric deformation is caused. In order to better solve the topology missing part, LAKe-Net provides a new topology perception point cloud complement model, and the missing point cloud is complemented by locally positioning key point alignment and adopting a 'key point-skeleton-shape' complement mode, which comprises three steps of alignment key point positioning, surface skeleton generation and shape refinement.

Although the existing point cloud completion networks based on deep learning have good performance in accuracy, a large number of redundant parameters exist, so that the reasoning efficiency of the network is greatly reduced, and the network is difficult to deploy into real application scenes such as point cloud repair of an automatic driving automobile.

Disclosure of Invention

Aiming at the problems of overlong neglect and reasoning time of the local information of the point cloud in the existing network, the invention provides a light-weight point cloud complement method combining channel pruning and channel attention.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

step 1, acquiring point cloud data and sampling and classifying the point cloud data;

step 2, constructing a light-weight point cloud complement network model combining channel pruning and channel attention;

and 3, inputting the point cloud data obtained in the step 1 into the model obtained in the step 2 to complete point cloud.

Further, the network model in the step 2 follows the encoder-decoder structure and increases the complement efficiency by using channel pruning globally;

in the encoder-decoder structure, an encoder embeds a one-dimensional channel attention module into an input point cloud on the basis of global feature extraction, and the characterization capability of local features is enhanced by adaptively adjusting the weight of the global features;

in the encoder-decoder structure, a decoder adopts a double-decoder structure of a semantic decoder and a structure refinement decoder, the semantic decoder generates a complete dense rough point cloud, the structure refinement decoder generates a sub-feature by sharing a multi-layer perceptron with a feature vector v and fuses the sub-feature with the feature vector v to finally output the offset of each point in the original residual point cloud; and adding the offset and the generated dense rough point cloud to generate a final refined point cloud.

Further, in the step 2, the global use of the network is based on L ₁ Disposable channel pruning of norms, L ₁ The norm represents the sum of absolute values of non-zero elements in the vector x, and its optimized solution is a sparse solution, thus L ₁ The norms are also called sparse rule operators, and are specifically defined as follows:

wherein, through L ₁ The sparse characteristic is realized, the channels with small contribution degree are deleted, only the channels with large contribution degree are reserved, and finally, the pruned new convolution is obtained, so that parameters are reduced, the precision loss is reduced, and the completion efficiency is greatly improved.

Further, the one-dimensional channel attention module in the step 2 specifically includes: the input point cloud is expressed as m multiplied by 3 matrix P, P is input into the attention module, a feature map is generated to strengthen the weight of the channel, wherein m represents the number of the input point cloud, 3 represents the x, y and z coordinates of each point, and the specific definition is as follows:

ε＝σ(CONV(P))⑵

wherein CONV represents one-dimensional convolution operation, sigma represents an activation function, values between 0 and 1 are obtained, the importance degree of different channels is represented, and then corresponding weights are distributed to the channels, and epsilon represents the importance of each characteristic channel;

multiplying the result epsilon with a matrix P to obtain a final output E, wherein the final output E is specifically defined as follows:

E＝φ(P) ⑶

wherein E represents a feature matrix after feature mapping, and phi (°) represents feature mapping; thus, different characteristic channels are given different weights, wherein one-dimensional convolution also helps to learn the relationship between different characteristic channels more efficiently through nonlinearity, and can reduce the number of parameters and the overfitting situation.

Further, the encoder in step 2 performs feature extraction on the input point cloud by using a one-dimensional channel attention module, and the specific process is as follows: embedding a one-dimensional channel attention module in the two laminated PointNet layers to extract the geometric information of the input point cloud; each PointNet layer comprises a shared multi-layer perceptron and a maximum pooling layer as basic modules;

(1) In the first PointNet layer, the matrix P learns each point feature P by sharing the multi-layer perceptron _i The feature matrix F, F is composed of the point features, and each behavior of the feature matrix F is learned _i Multiplying F by a feature matrix E with the same size obtained by the one-dimensional channel attention module point by point;

(2) Finally obtaining 256-dimensional global features g through a point-by-point maximum pooling layer; in the second PointNet layer, global feature g and feature matrix are taken as inputs, g is taken as an input with each independent point feature P _i Concatenating and expanding to generate an augmented point feature matrix

(3) Sharing the multi-layer perceptron and point-by-point max pooling by another similar to the first PointNet layer;

(4) Extracted 1024-dimensional global feature vector v epsilon R ^k Where k=1024.

Further, the decoder in step 2 uses a dual decoder structure to complement the input point cloud, and the specific complement process includes:

(1) The feature vector v is respectively transmitted into a semantic densification decoder and a structure refinement decoder;

(2) The semantic densification decoder generates a sparse point cloud with a complete geometric surface by using three full connection layers, outputs a final vector with 3N units, and reshapes the final vector into an N×3 rough point cloud P _coarse The method comprises the steps of carrying out a first treatment on the surface of the Subsequently, P is _coarse Tiling the points in (a) to produce a dense set of points P' _coarse =rnx 3, where r is the upsampling rate; second, to fully exploit the characteristics of the input point cloud, the network will P' _coarse Is connected to obtain new aggregated features by having a size of 512, 512,3]Shared multi-layer perception of (1)A machine generates a new rN multiplied by 3 matrix M'; the shared multi-layer perceptron may be viewed as a non-linear map that converts the 2D mesh into a smooth 2D manifold in 3D space; finally, by combining P' _coarse The coordinates of each point in the matrix M 'are added to generate a dense point cloud P' _dense ＝rN×3；

(3) The structure refinement decoder comprises a root node N for solving the problems of dense point cloud loss detail and uneven density distribution ₀ For receiving the feature vector v and using M ₁ M with dimension C generated by multiple layers of perceptrons ₁ A feature vector corresponding to M of the first layer in the hierarchical structure ₁ A child node; then, the feature vector of each i.gtoreq.1 level node is connected with the global feature v generated by the encoder and is formed by M _i+1 Further processing by multiple layers of perceptrons to generate M for each node of the next level i+1 _i+1 A sub-feature; all nodes on each layer are formed by M _i Processing the same shared multi-layer perceptron; at the last layer of the tree structure, the feature vector dimension C=3 generated for each leaf node is used as the deviation value of the original point cloud and the dense point cloud P _dense Adding to finally generate complete refinement point cloud P _refined Specifically defined as follows:

P _refined ＝R(v)+P _dense ⑷

further, the loss function of the network is defined as the topological distance between the complement target and the true value, the chamfer distance CD and the earth moving distance EMD are used as two displacement invariants to be used as the comparative unordered point cloud, and CD is selected as the complement loss, so that the calculation efficiency is micro and higher than that of EMD, and the specific definition of CD between the complement point cloud and the real point cloud of the calculation output is as follows:

wherein d _CD For chamfer distance, P _c Complement point cloud for output, P _gt For a real point cloud, x and y are P respectively _c And P _gt In (1), in the first itemCalculating P _c Is mapped to P _gt Averaging after the nearest Euclidean distance point in (2), calculating P in the second term _gt Is mapped to P _c Averaging after the points with the nearest Euclidean distance; p (P) _c And P _gt Is not required to be the same size.

Further, selecting the chamfer distance as a CD between the complement point cloud and the real point cloud of the complement loss calculation output, wherein the overall loss function of the network model is defined by the following formula:

Loss(P _coarse ,P _dense ,P _gt )＝d _CD1 (P _coarse ,P _gt )+αd _CD2 (P _dense ,P _gt )⑹

wherein d _CD1 Representing a coarse point cloud P _coarse And a true point cloud P _gt D is the chamfer distance of _CD2 Representing dense point cloud P _dense And a true point cloud P _gt The Loss value is inversely proportional to the complement performance.

The present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the method of the first aspect.

The present invention provides a computer device comprising a memory and a processor, on said memory a computer program capable of running on the processor is stored, said processor implementing the steps of the method of the first aspect when executing said computer program.

Compared with the prior art, the invention has the following advantages:

the method is suitable for real-time scenes such as automatic driving, the reasoning speed of the network is greatly improved through a one-time channel pruning algorithm, and in order to compensate the precision loss caused by channel pruning, a one-dimensional channel attention module is designed in a feature extraction stage, so that the one-dimensional channel attention module can learn and utilize the input point cloud features better, the overall network completion speed and precision are improved, and the point cloud completion can be widely applied to the real-time scenes.

Drawings

FIG. 1 is a schematic diagram of a lightweight point cloud completion method combining channel pruning and channel attention according to the present invention;

FIG. 2 is a schematic diagram of a one-dimensional channel attention module of the present invention;

FIG. 3 is a schematic diagram of a semantically dense decoder module of the present invention;

FIG. 4 is a schematic diagram of a structure refinement decoder module of the present invention;

fig. 5 is a point cloud completion result visualization of PCN datasets under different networks.

Detailed Description

The present invention will be described more fully hereinafter in order to facilitate an understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

Example 1

A light-weight point cloud completion method combining channel pruning and channel attention, as shown in figure 1, comprises the following steps:

step 1: acquiring point cloud data, and sampling and classifying the point cloud data;

specifically, PCN datasets are used as datasets for the lightweight vehicle point cloud complementation method of the present invention that combines channel pruning and channel attention. The PCN dataset is a subset of 8 categories from which the shapen dataset is derived. A dataset containing partial and complete point cloud pairs was created using seven classes of synthetic CAD models in the PCN dataset to train the models, comprising a total of 27285 different model instances, with 100 models in each class for verification, 150 models for testing, and the remaining models reserved for training. To generate a complete, real point cloud, 16384 points are uniformly sampled for each class CAD model surface. The incomplete point cloud input does not use a subset of the complete point cloud, but instead presents the CAD model of the instance as a set of depth images of different perspectives, and then back projects these depth images to different view planes to generate an incomplete point cloud, which can make the distribution of the incomplete point cloud closer to the real world scan data and does not specify the size of the incomplete point cloud.

Step 2: constructing a lightweight point cloud complement network model (LVPC-Net) combining channel pruning and channel attention;

specifically, the network model follows the encoder-decoder architecture and increases the complement efficiency in global use of channel pruning; the encoder embeds a one-dimensional channel attention module into the input point cloud on the basis of global feature extraction, and enhances the characterization capability of local features by adaptively adjusting the weights of global features; the decoder adopts a double-decoder structure of a semantic decoder and a structure refinement decoder, the semantic decoder generates a complete dense rough point cloud, and the structure refinement decoder generates sub-features of a feature vector v through a shared multi-layer perceptron and fuses the offset of each point in the original residual point cloud; and finally, adding the offset to the generated dense coarse point cloud to generate a final refined point cloud.

The encoder extracts point cloud features using PCN as a backbone network.

L-based global use in a network ₁ Disposable channel pruning of norms, L ₁ The norm represents the sum of absolute values of non-zero elements in the vector x, and its optimized solution is a sparse solution, thus L ₁ The norms are also called sparse rule operators, and are specifically defined as follows:

wherein, through L ₁ The sparse characteristic can be realized, channels with small contribution degree are deleted, only channels with large contribution degree are reserved, and finally, the pruned new convolution is obtained, so that parameters are reduced, the precision loss is reduced, and the completion efficiency is greatly improved.

The one-dimensional channel attention module is specifically defined as: the input point cloud is represented as a matrix P of m×3, where m represents the number of input point clouds and 3 represents the x, y, z coordinates of each point. Subsequently, P is input into the attention module, and a feature map is generated to strengthen the weight of the channel, specifically defined as follows:

ε＝σ(CONV(P))⑵

wherein CONV represents one-dimensional convolution operation, sigma is taken as an activation function, a numerical value between 0 and 1 is obtained to represent the importance degree of different channels, and then corresponding weights are distributed to the channels, and epsilon represents the importance of each characteristic channel.

Subsequently, the result ε is multiplied by the matrix P to obtain the final output E, which is specifically defined as follows:

E＝φ(P) ⑶

wherein E represents the feature matrix after feature mapping, and phi (·) represents feature mapping. Thus, different characteristic channels are given different weights, wherein one-dimensional convolution also helps to learn the relationship between different characteristic channels more efficiently through nonlinearity, and can reduce the number of parameters and the overfitting situation.

The encoder utilizes a one-dimensional channel attention module to extract characteristics of the input point cloud, and the specific process comprises the following steps: and embedding a one-dimensional channel attention module into the two laminated PointNet layers to extract the geometric information of the input point cloud. Each PointNet layer includes a shared multi-layer perceptron and a max-pooling layer as basic modules. First, in the first PointNet layer, the matrix P learns each point feature P by sharing the multi-layer perceptron _i The feature matrix F is composed of the point features, and each row is the learned point feature P _i Multiplying F by a feature matrix E with the same size obtained by the one-dimensional channel attention module point by point; secondly, finally obtaining 256-dimensional global features g through a point-by-point maximum pooling layer; then, in the second PointNet layer, the global feature g and the feature matrix are taken as inputs, g is taken as an input with each independent point feature P _i Connecting and expanding to generate an augmented point feature matrix F-; again, by another shared multi-layer perceptron similar to the first PointNet layer and point-by-point maximum pooling; finally, the 1024-dimensional global feature vector v epsilon R is extracted ^k Where k=1024.

The decoder uses a double decoder structure to complement the input point cloud, and the specific complement process comprises the following steps:

(2) The semantic densification decoder uses three fully connected layers to generate a sparse point cloud with a complete geometric surface that outputs a final vector of 3N units and reshapes it into an N x 3 coarse point cloud P _coarse The method comprises the steps of carrying out a first treatment on the surface of the Subsequently, P is _coarse Tiling the points in (a) to produce a dense set of points P' _coarse =rnx 3, where r is the upsampling rate; second, to fully exploit the characteristics of the input point cloud, the network will P' _coarse Is connected to obtain new aggregated features by having a size of 512, 512,3]Generating a new rN x 3 matrix M', which can be seen as a non-linear map that converts the 2D mesh into a smooth 2D manifold in 3D space; finally, by combining P' _coarse The coordinates of each point in the matrix M 'are added to generate a dense point cloud P' _dense ＝rN×3；

(3) The structure refinement decoder comprises a root node N for solving the problems of dense point cloud loss detail and uneven density distribution ₀ For receiving the feature vector v and using M ₁ M with dimension C generated by multiple layers of perceptrons ₁ A feature vector corresponding to M of the first layer in the hierarchical structure ₁ And a child node. Then, the feature vector of each i.gtoreq.1 level node is connected with the global feature v generated by the encoder and is formed by M _i+1 Further processing by multiple layers of perceptrons to generate M for each node of the next level i+1 _i+1 Sub-features. All nodes on each layer are formed by M _i The same shared multi-layer perceptron processes. At the last layer of the tree structure, the feature vector dimension C=3 generated for each leaf node is used as the deviation value of the original point cloud and the dense point cloud P _dense Adding to finally generate complete refinement point cloud P _refined Specifically defined as follows:

P _refined ＝R(v)+P _dense ⑷

and selecting the chamfer distance as the CD between the complement point cloud and the real point cloud which are output by the complement loss calculation, wherein the calculation process is expressed by the following formula:

wherein P is _c Complement point cloud for output, P _gt For a real point cloud, x and y are P respectively _c And P _gt In (1), calculate P in the first term _c Is mapped to P _gt The euclidean distance in (c) is averaged after the nearest point and vice versa. Thus, P _c And P _gt Is not required to be the same size.

The overall loss function of the network model is defined as follows:

Loss(P _coarse ,P _dense ,P _gt )＝d _CD1 (P _coarse ,P _gt )+αd _CD2 (P _dense ,P _gt ) ⑹

Step 3: inputting the PCN data set obtained in the step 1 into the model obtained in the step 2 to carry out incomplete cloud completion;

specifically, the running environment is ubuntu18.0, 400 cycles of training are performed, an Adam optimizer with an initial learning rate of 0.0001 is used for training the network, the batch size is set to 32, the learning rate decays by 0.7 after each 50 iterations, and the results are obtained through multiple tests, see specifically tables 1, 2 and 3.

TABLE 1 Point cloud completion result Comparison (CD) for different networks

Verification method on PCN dataset, CD (10 ³ ) F-score (%), single frame completion Time Time (ms), FPS and acceleration ratio Speedup are used as evaluation indexes. The more CD valueThe smaller the order of the Net predicted complement results is the closer to the shape of the real point cloud, it can be seen from Table 1 that the CD value of LVPC-Net is better than the current most point cloud complement network, and the average CD value of LVPC-Net is reduced by 8.31% compared with other networks (except PMPNet++) although the average CD value is 0.77 higher than the PMPNet++ with the best performance, the network performance of more than 90% can be achieved, and the Net is in the front in the complement network. Compared with other networks, the LVPC-Net improves the performance of the network in the characteristic extraction stage by adding the OdAM, so that important characteristics of the input point cloud are better utilized, and the effective prediction of the complement point cloud is ensured.

TABLE 2 Point cloud completion results for different networks (F-score)

The greater the F-score value, the higher the accuracy of the network predicted completion result, and the better the F-score value of LVPC-Net was compared to most point cloud completion networks, as can be seen from Table 2, the average improvement in the F-score value of LVPC-Net over other networks (except PMPNet++) was 2.14% lower than the optimal PMPNet++ by 2.19%, but also achieved 95% performance, still at an advanced level. The original deviation value of the input point cloud is reserved in the process of generating the dense point cloud by refinement, so that the network accuracy is improved, and great help is provided for supplementing the point cloud.

TABLE 3 Point cloud completion time vs (ms, speedup) for different networks

As can be seen from Table 3, the LVPC-Net complement time is far less than that of the existing various point cloud complement networks with precision as a center, the average acceleration ratio reaches 10.36, the acceleration ratio with PMPNet++ is highest, the acceleration ratio is 12.21, and tasks are always time-urgent in high-real-time scenes such as automatic driving. PCN, GRNet, PMPNet and PMPNet++ are generally less than 10FPS, which is difficult to meet the requirement of the real-time property of the automatic driving, and LVPC-Net is used for deleting some unimportant channels due to the disposable channel pruning algorithm, so that the network is lighter and more efficient, and the point cloud completion can be completed only by 15ms, namely, in the automatic driving process, the LVPC-Net is about 67FPS, which is enough to meet the requirement of the real-time property, and is completely in an acceptable range although the precision is reduced.

Fig. 5 is a visual comparison of the completed results under different networks, from which it can be seen that LVPC-Net can preserve fine detail in the completed results. The PCN only focuses on the global features of the point cloud in the feature extraction stage, so that the result of completion in detail still has a defect, and the situation that some points are unevenly distributed can be caused when the full-connection layer returns to a large number of points, such as the situation that the models of a desk lamp, a sofa and a table are completed, and a large number of noise points are formed. The GRNet's complement process uses information of two different data modes (i.e., point cloud and voxel), however, the GRNet's voxel representation is only used to reconstruct low-resolution shape, and for the structural model of automobile with smooth curved surface, the complement result will be uneven and have a large number of noise points. For PMPNet, since the point cloud is complemented by moving each point in the incomplete input, the gap of the complemented missing area is too large, the distribution of points is uneven, such as airplane and automobile models, and a large number of noise points appear even when the chair model is complemented. Pmpnet++ adds a transducer module on the basis of PMPNet to enhance the ability to learn point features, with the aim of predicting more accurate displacement of each point, but also has drawbacks in some detail, such as the absence of rearview mirrors in the automobile complement results. The method of the invention obtains relatively good complementing effect at the above position, which shows that LVPC-Net learns the key point characteristics of the input point cloud more effectively through the channel attention mechanism, thereby greatly helping the complementing result, and meanwhile, at the stage of refining the point cloud, a structure refining decoder is introduced to obtain the offset of the original input point cloud, so that the network can keep the detail characteristics of the input point cloud during complementing, and the complementing result is finer.

It is noted that embodiments of the present invention may be provided as methods, systems, and/or computer program products. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

What is not described in detail in the present specification belongs to the prior art known to those skilled in the art. While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims

1. The light-weight point cloud complement method combining channel pruning and channel attention is characterized by comprising the following steps of:

2. The method of claim 1, wherein the network model in step 2 follows the encoder-decoder structure and uses channel pruning globally to increase the efficiency of the complementation;

in the encoder-decoder, an encoder embeds a one-dimensional channel attention module into an input point cloud on the basis of global feature extraction, and the characterization capability of local features is enhanced by adaptively adjusting the weight of the global features;

in the encoder-decoder, the decoder adopts a double-decoder structure of a semantic decoder and a structure refinement decoder, the semantic decoder generates a complete dense rough point cloud, the structure refinement decoder generates sub-features by sharing a multi-layer perceptron with a feature vector v and fuses the sub-features to finally output the offset of each point in the original residual point cloud; and adding the offset and the generated dense rough point cloud to generate a final refined point cloud.

3. The method for light-weight point cloud completion combining channel pruning and channel attention as claimed in claim 2, wherein the encoder embeds a one-dimensional channel attention module on the basis of global feature extraction for the input point cloud, and uses an L-based algorithm on the network global basis ₁ Disposable channel pruning of norms, L ₁ The norm represents the sum of absolute values of non-zero elements in the vector x, and is specifically defined as follows:

4. The method for light-weight point cloud completion by combining channel pruning and channel attention according to claim 2, wherein the one-dimensional channel attention module in the step 2 is specifically: the input point cloud is expressed as m multiplied by 3 matrix P, P is input into the attention module, a feature map is generated to strengthen the weight of the channel, wherein m represents the number of the input point cloud, 3 represents the x, y and z coordinates of each point, and the specific definition is as follows:

ε＝σ(CONV(P)) (2)

E＝φ(P) (3)

wherein E represents the feature matrix after feature mapping, and phi (·) represents feature mapping.

5. The light-weight point cloud completion method combining channel pruning and channel attention as set forth in claim 2, wherein the encoder embeds a one-dimensional channel attention module into the input point cloud on the basis of global feature extraction, and the encoder performs feature extraction on the input point cloud by using the one-dimensional channel attention module, and the specific process is as follows: embedding a one-dimensional channel attention module in the two laminated PointNet layers to extract the geometric information of the input point cloud; each PointNet layer comprises a shared multi-layer perceptron and a maximum pooling layer as basic modules;

6. The light-weight point cloud completion method combining channel pruning and channel attention as claimed in claim 2, wherein the decoder adopts a double decoder structure of a semantic decoder and a structure refinement decoder, the decoder uses the double decoder structure to complete the input point cloud, and the specific completion process comprises the following steps:

(2) The semantic densification decoder generates a sparse point cloud with a complete geometric surface by using three full connection layers, outputs a final vector with 3N units, and reshapes the final vector into an N×3 rough point cloud P _coarse The method comprises the steps of carrying out a first treatment on the surface of the Subsequently, P is _coarse Tiling the points in (a) to produce a dense set of points P' _coarse =rnx 3, where r is the upsampling rate; second, to fully exploit the characteristics of the input point cloud, the network will P' _coarse Is connected to obtain new aggregated features by having a size of 512, 512,3]Generating a new rN x 3 matrix M'; finally, by combining P' _coarse The coordinates of each point in the matrix M 'are added to generate a dense point cloud P' _dense ＝rN×3；

(3) The structure refinement decoder comprises a root node N ₀ For receiving the feature vector v and using M ₁ M with dimension C generated by multiple layers of perceptrons ₁ A feature vector corresponding to M of the first layer in the hierarchical structure ₁ A child node; then, the feature vector of each i.gtoreq.1 level node is connected with the global feature v generated by the encoder and is formed by M _i+1 Further processing by multiple layers of perceptrons to generate M for each node of the next level i+1 _i+1 A sub-feature; all nodes on each layer are formed by M _i Processing the same shared multi-layer perceptron; at the last layer of the tree structure, the feature vector dimension C=3 generated for each leaf node is used as the deviation value of the original point cloud and the dense point cloud P _dense Adding to finally generate complete refinement point cloud P _refined Specifically defined as follows:

P _refined ＝R(v)+P _dense (4)

7. the method for light-weight point cloud completion combining channel pruning and channel attention according to claim 2, wherein a loss function of a network is defined as a topological distance between a completion target and a true value, a chamfer distance CD and a soil moving distance EMD are used as two displacement invariants to be used as a relatively unordered point cloud, CD is selected as the completion loss, and the CD between the outputted completion point cloud and a real point cloud is calculated as shown in the following formula:

wherein d _CD For chamfer distance, P _c Complement point cloud for output, P _gt For a real point cloud, x and y are P respectively _c And P _gt In (1), calculate P in the first term _c Is mapped to P _gt Averaging after the nearest Euclidean distance point in (2), calculating P in the second term _gt Is mapped to P _c Averaging after the points with the nearest Euclidean distance; p (P) _c And P _gt Is not required to be the same size.

8. The method for light-weight point cloud completion combining channel pruning and channel attention as claimed in claim 7, wherein the selecting the chamfer distance is used as a CD between the full point cloud and the real point cloud of the full loss calculation output, and wherein the overall loss function of the network model is defined by the following formula:

Loss(P _coarse ,P _dense ,P _gt )＝d _CD1 (P _coarse ,P _gt )+αd _CD2 (P _dense ,P _gt ) (6)

9. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program implementing the steps of the method of any one of claims 1 to 8 when executed by a processor.

10. A computer device comprising a memory and a processor, on which memory a computer program is stored that can be run on the processor, characterized in that: the processor, when executing the computer program, implements the steps of the method of any of claims 1-8.