CN112990336A

CN112990336A - Depth three-dimensional point cloud classification network construction method based on competitive attention fusion

Info

Publication number: CN112990336A
Application number: CN202110347537.5A
Authority: CN
Inventors: 达飞鹏; 陈涵娟
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2021-06-18
Anticipated expiration: 2041-03-31
Also published as: CN112990336B

Abstract

The invention discloses a method for constructing a deep three-dimensional point cloud classification network based on competitive attention fusion. Firstly, preprocessing an original point cloud to obtain an input point cloud, then extracting high-dimensional features through two layers of competitive attention fusion feature abstraction layers, and finally sending the high-dimensional features into a classifier to obtain a classification score. The competitive attention fusion feature abstraction layer firstly obtains high-dimensional features of input data through the feature extraction layer, then sends the high-dimensional features and the original input data into the CAF module together for feature fusion, and outputs the fusion features as the module. The core CAF module of the invention focuses on the extraction and fusion of global features of different levels, measures the intrinsic similarity of the features, can be applied to different point cloud classification networks in an embedded manner, has mobility and expansibility, improves the expression capability of the global features of the network, and is obviously helpful for enhancing the robustness of the model against noise.

Description

Depth three-dimensional point cloud classification network construction method based on competitive attention fusion

Technical Field

The invention relates to a deep three-dimensional point cloud classification network construction method based on competitive attention fusion, belongs to the technical field of three-dimensional point cloud classification in computer vision, and is particularly suitable for a point cloud classification task containing noise interference.

Background

In computer vision applications, the analysis processing of two-dimensional images sometimes fails to meet the requirements of practical applications. The three-dimensional point cloud data greatly makes up for the deficiency of the two-dimensional image in many application scenes on the spatial structure information. With the development of deep learning and neural networks, research on three-dimensional point clouds has shifted from low-dimensional geometric features to high-dimensional semantic understanding. Many recent studies adopt learning methods based on deep neural networks, and such methods can be further classified according to different three-dimensional data expression modes: methods based on manual feature preprocessing, multi-view based, voxel based, and raw point cloud data.

The original three-dimensional data is simple to express, the original three-dimensional representation of the object can be displayed better, and the three-dimensional point cloud is used as input, so that adverse factors caused by inputting regular data such as multiple views and voxels in a convolution network, such as unnecessary volume division and influence on invariance of point cloud data, are avoided. Due to the influence of the acquisition equipment and the coordinate system, the arrangement sequence of the obtained three-dimensional point cloud data is greatly different. Aiming at the problem of classification and segmentation of disordered point cloud data, a PointNet network creatively proposes to directly process sparse unstructured point clouds and obtain global features by using a multilayer perceptron and maximum pooling. Since then, researchers have proposed many PointNet-based network frameworks such as PointNet + +, PCPNet, SO-Net, and others. In addition, for the problem of classification and segmentation of three-dimensional Point cloud data, other researches propose famous network frames such as PointCNN, densipoint, Point2Sequence, a-CNN, PointWeb and the like, and other methods adopt a graph convolution network to learn local graphs or geometric elements, but the methods also have problems, such as lack of display semantic abstraction from local to global, or greater complexity.

The deep three-dimensional point cloud classification network is used for researching main contradictions in point cloud feature extraction, and the purpose of the deep three-dimensional point cloud classification network is to improve the classification precision and efficiency of models, enhance the robustness and the like. The optimization of feature extraction capability and the improvement of resistance to disturbance factors such as disturbance, outliers and random noise are two very important research hotspots in a point cloud processing task, are key problems to be solved urgently, and have very important influence on a three-dimensional point cloud classification task and application thereof.

Disclosure of Invention

The technical problem is as follows: in order to improve the extraction and expression capacity of a three-dimensional point cloud deep network classification model on global features and enhance the robustness of the model on noise interference, the invention provides a deep three-dimensional point cloud classification network construction method based on competitive attention fusion. The core technology of the method is to provide a CAF module (Competitive Attention Fusion module, namely a CAF module for short, having the English name of Competitive Attention Fusion Block) to learn the global representation and the internal similarity of the intermediate features of the multi-level features and redistribute the weight of the intermediate feature channel. The module has independence and mobility, has better global feature extraction capability, focuses on core backbone features more beneficial to three-dimensional point cloud shape classification, and resists the influences of point cloud disturbance, outlier noise and random noise to a certain extent.

The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:

a depth three-dimensional point cloud classification network construction method based on competitive attention fusion comprises the following steps:

step 1: preprocessing original point cloud data;

step 2: constructing a CAF module to form a competitive attention fusion feature abstraction layer;

and step 3: stacking two competitive attention fusion feature abstraction layers to construct a deep three-dimensional point cloud classification network;

and 4, step 4: and sending the high-dimensional features finally output by the second layer competitive attention fusion feature abstraction layer to a classifier to obtain a classification result.

Further, the preprocessing of the original point cloud data in the step 1 includes the following steps:

b samples are processed in parallel in batches, N original point cloud data of each sample are preprocessed, and the specific method is that the samples are sampled in a down-sampling mode to obtain N-containing point cloud data₀Sampling result P of individual point cloud data_Sample。

Further, the step 2 of constructing the competitive attention fusion feature abstraction layer specifically includes the following steps:

the competitive attention fusion feature abstraction layer is composed of a feature extraction layer and a CAF module, and firstly, the feature extraction layer receives input data D from the competitive attention fusion feature abstraction layer_inExtracting high-dimensional characteristics F of input data through multilayer convolution network_extTo input data D_inAnd high dimensional feature F_extThe two are taken as the input of a CAF module, and feature fusion is carried out in the CAF module;

the CAF module comprises an MFSE sub-module (i.e., a Multi-layer Feature Squeeze Excitation sub-module, namely a Multi-layer Feature Squeeze and Excitation Block for short) and a FICSA sub-module (i.e., a Feature intrinsic Self-Attention sub-module, namely a Feature intrinsic Connection Self-Attention Block for short), wherein:

the MFSE submodule focuses on extraction and fusion of global features of different levels, and the MFSE submodule inputs input data of the CAF submodule

And high dimensional features

Separately performing pooling and encoding operations, wherein

The number is a real number set,

representing a dimension N within a real number range_i×C_iOf a two-dimensional matrix of, N_iIs the point cloud number of the current stage sample, C_iThe number of characteristic channels of the sample at the current stage, i is the serial number of 5 stages with different matrix dimensions, and the coded characteristic is obtained

(N ₃1 is F_MFSE-inPoint cloud number of C₃＝C₁R is F_MFSE-inCharacteristic number of channels) and

(N ₄1 is F_MFSE-extPoint cloud number of C₄＝C₂R is F_MFSE-extThe number of characteristic channels) as follows:

where P (-) is the Max pooling function of global feature aggregation Max pooling, φ (-) is the fully connected layer and Relu activation functions, and the channel scaling r is used to adjust the number of intermediate channels;

then, stacking the two coding features according to the channel direction to obtain a stacking result

N ₅1 is F_MFSE-ConcatPoint cloud number of C₅＝(C₁+C₂) R is F_MFSE-ConcatThe formula of the characteristic channel number is as follows:

then, the channel number and the feature map size of the stacking result are expanded to be equal to the high-dimensional feature F through the full-connection layer_extThe same dimension, using the feature as the output F of the MFSE submodule_MFSEThe formula is as follows:

wherein

For the full connectivity layer extension procedure with the normalization function Sigmoid,

the global attention weight finally obtained by the MFSE submodule;

the FICSA sub-module aims at measuring the intrinsic similarity of the features, and the FICSA sub-module inputs the high-dimensional features of the CAF module

Performing 1 × 1 point-to-point convolution operation, and linearly mapping the features of all channels of each point to three parallel high-dimensional features, wherein the formula is as follows:

wherein V (·), Q (·) and K (·) are three independent feature mapping functions respectively to obtain three corresponding advanced features, and the dimensions are N₂×C₂,w_iFor different linear transformation coefficients, subsequently, similarity calculation is carried out, and the correlation between Q (-) and K (-) is obtained through dot product operation, wherein the formula is as follows:

wherein A (-) is a high-dimensional relation in the middle characteristic, gamma is a Softmax normalization function with aggregation function,

is a selectable channel scaling coefficient set for reducing the number of training parameters, and finally obtains a global attention weight F of the internal association of the characteristic points and the characteristic points_FICSAThe formula is as follows:

F_FICSA＝γ(A(F_ext)V(F_ext)) (6)

wherein, V (-) is used for adjusting the feature channel dimension of A (-) and taking the feature as the final output of FICSA submodule

Finally, the CAF module outputs F of the MFSE submodule_MFSEAnd output F of FICSA submodule_FICSACompetitive weight fusion is carried out, residual learning is introduced, the weight of the characteristic channel is redistributed, and the formula is as follows:

F_CAF＝αF_MFSE+βF_FICSA (7)

through matrix addition, after the global attention weight is fused according to different proportionality coefficients alpha and beta, the final weight distribution coefficient is obtained

Obtaining output of CAF module by weight redistribution and residual connection

F_Fusion＝F_ext+F_CAFF_ext (8)

Output F of CAF module_FusionI.e. the output of the competitive attention fusion feature abstraction layer.

Further, two competitive attention fusion feature abstraction layers are stacked in the step 3, and the constructing of the deep three-dimensional point cloud classification network specifically includes the following steps:

the sampling result P in the step 1 is processed_SampleSending the data as input into a first layer competitive attention fusion feature abstract layer to obtain a fused feature F_Fusion-Mid(ii) a The fused feature F_Fusion-MidThen as input, sending the data into a second layer competitive attention fusion feature abstract layer to obtain the final fusion feature F_Fusion-Final。

Further, the step 4 of sending the high-dimensional features finally output by the second layer competitive attention fusion feature abstraction layer to the classifier includes the following steps:

after a second layer of competitive attention fusion feature abstraction layer, a multi-layer perceptron (MLP) is introduced as a classifier, and classification learning is carried out on input point cloud fusion features to obtain classification scores.

Has the advantages that: the invention provides a method for constructing a deep three-dimensional point cloud classification network based on competitive attention fusion, which is characterized in that the core is a CAF (computer aided design) module which is a migratable intermediate characteristic channel optimization structure, residual connection and channel competitiveness are introduced, and the weight of a characteristic channel is redistributed by learning by taking two kinds of attention as the core. The CAF module contains two sub-modules: 1) the MFSE submodule focuses on extraction and fusion of global features of different levels; 2) and the FICSA sub-module measures the inherent similarity of the intermediate features. The CAF module can be applied to different point cloud classification networks in an embedded mode, has mobility and expansibility, improves the expression capability of global characteristics of the point cloud, and strengthens the robustness of the model to noise interference.

The point cloud feature extraction network adopts two or more intermediate feature abstraction layers, and the intermediate features are usually a set of global features and local features, so that the accuracy of classification results is influenced to a great extent. The CAF module provided by the invention obtains a fusion weight through the learning of two layers of intermediate output features, the weight can represent the importance and expressive force of the intermediate feature channel of the current layer, and the new optimized intermediate feature is obtained by redistributing the channel features through the weight. In brief, the CAF module utilizes the central idea of the attention mechanism to aggregate the salient features, excite the channel features which are more important and have larger influence on the result, suppress the invalid or ineffective channel features, reduce noise interference and improve the robustness of the model.

Noise interference in the actual point cloud includes disturbance and outlier, which is often represented as position offset of the sample partial point set, and background noise exists. The set of noise points is also considered to be part of the sample when testing the model, thus affecting the classification result of the sample. The role of the CAF module in the network is to make the model focus more on the core features that determine the sample type by adjusting the weights of the intermediate feature channels. The two sub-modules learn from two different angles associated with the global features and the intermediate features of multiple levels to obtain weights which are more beneficial to focusing a core channel, so that the learning capacity of the network on the global features is improved, the anti-interference capacity of the model is enhanced, and the method helps to solve the difficult problem in the point cloud depth network.

Drawings

FIG. 1 is a flow chart of a method for constructing a deep three-dimensional point cloud classification network based on competitive attention fusion;

FIG. 2 is a schematic diagram of a competitive attention fusion feature abstraction layer provided by the present invention;

FIG. 3 is a schematic diagram of an MFSE sub-module in a CAF module provided by the present invention;

FIG. 4 is a diagram of a FICASA sub-module in a CAF module provided by the present invention;

FIG. 5(a) is the anti-interference performance of the CAF module to point cloud disturbance (Gaussian noise);

FIG. 5(b) is the immunity of the CAF module to outliers (random noise);

FIG. 6(a) is the effect of CAF modules on model robustness over Pointnet + +;

FIG. 6(b) is the effect of CAF modules on model robustness at PointASNL;

fig. 6(c) is the ultimate immunity of the CAF module to interference on PointASNL.

Detailed Description

The invention is further elucidated with reference to the drawings and the embodiments.

Under a Ubuntu operating system, TensorFlow is selected as a platform, a deep three-dimensional point cloud classification network based on competitive attention fusion is built, and the effectiveness of the CAF module is verified on a classical reference network Pointnet + + and a reference network PointASNL with excellent performance in recent years. After the result display is added into the CAF module, the anti-interference capability of the network to the point cloud noise can be obviously enhanced under the condition of keeping the average accuracy of the classification result not to be reduced. The robustness of the model can be further improved while the classification precision is kept stable by adjusting the number of training sample input points.

A depth three-dimensional point cloud classification network construction method based on competitive attention fusion is disclosed, and a network framework is shown in figure 1. Wherein the competitive attention fusion feature abstraction layer structure is shown in fig. 2. Fig. 3 is a schematic diagram of an MFSE sub-module in a CAF module provided by the present invention. Fig. 4 is a schematic diagram of a FICSA sub-module in a CAF module provided in the present invention.

The method specifically comprises the following steps:

step 1: the method comprises the steps of preprocessing original point cloud data, parallelly batching B to 24 samples, preprocessing N to 10000 original point cloud data of each sample, and sampling in a down-sampling mode to obtain the original point cloud data containing N₀1024 point cloud data sampling results

Step 2: constructing a CAF module to form a competitive attention fusion feature abstraction Layer, wherein two layers of competitive attention fusion feature abstraction layers, i.e. Layer _1 and Layer _2, are respectively composed of two parts, firstly, the feature extraction Layer receives input data D from the competitive attention fusion feature abstraction Layer_inIn Layer _1

The input in Layer _2 is the final output result of Layer _1, i.e. the output result is

Extracting high-dimensional features F of input data through multilayer convolution network_extIn Layer _1

In Layer _2

To input data D_inAnd high dimensional feature F_extThe two are taken as the input of a CAF module, and feature fusion is carried out in the CAF module;

the CAF module comprises an MFSE sub-module and a FICSA sub-module:

the MFSE submodule focuses on extraction and fusion of global features of different levels, and inputs input data D of the CAF submodule into the MFSE submodule_in(in Layer _ 1)

In Layer _2

) And high dimensional feature F_ext(in Layer _ 1)

In Layer _2

) Respectively performing pooling and encoding operations to obtain encoded features F_MFSE-in(in Layer _ 1)

In Layer _2

) And F_MFSE-ext(in Layer _ 1)

In Layer _2

) The formula is as follows:

wherein P (-) is a Max pooling function Max pooling of global feature aggregation,. phi (-) is a full connectivity layer and Relu activation function, and a channel scaling r 4 is used for adjusting the number of intermediate channels;

then, stacking the two coding features according to the channel direction to obtain a stacking result F_MFSE-ConcatIn Layer _1

In Layer _2

The formula is as follows:

wherein

For a full connection layer extension process with a normalization function Sigmoid, F_MFSEThe global attention weight finally obtained for the MFSE submodule, in Layer _1

In Layer _2

wherein V (·), Q (·) and K (·) are three independent feature mapping functions respectively to obtain three corresponding advanced features, and the dimensions are N₂×C₂,w_iAutomatically learning the different linear conversion coefficients in the training process, then carrying out similarity calculation through dot productThe operation obtains the correlation between Q (-) and K (-) and the formula is as follows:

F_FICSA＝γ(A(F_ext)V(F_ext)) (6)

In Layer _1

In Layer _2

F_CAF＝F_MFSE+F_FICSA (7)

through matrix addition, after the global attention weight is fused according to the proportion coefficient alpha being 1 and the beta being 1, the final weight distribution coefficient is obtained

In Layer _1

In Layer _2

Obtaining output of CAF module by weight redistribution and residual connection

F_Fusion＝F_ext+F_CAFF_ext (8)

Output F of CAF module_FusionI.e. the output of the competitive attention fusion feature abstraction Layer, in Layer _1

In Layer _2

And step 3: stacking two competitive attention fusion feature abstraction layers, namely Layer _1 and Layer _2, constructing a deep three-dimensional point cloud classification network, and sampling results in the step 1

As input, the first competitive attention fusion feature abstract Layer 1 is input to obtain the fused feature

Combining the fused features

Then as input, the input is sent to a second competitive attention fusion feature abstract Layer 2 to obtain the final fusion feature

And 4, step 4: and (3) sending the high-dimensional features finally output by the second Layer of competitive attention fusion feature abstraction Layer 2 into a classifier to obtain a classification result, introducing a multi-Layer perceptron (MLP) as the classifier after the competitive attention fusion feature abstraction Layer 2, wherein parameters of an MLP output channel are [256,512,1024,512,256 and 40], and carrying out classification learning on the input point cloud fusion features to obtain a classification score.

The experimental results are specifically as follows:

experiment 1: and (5) classifying the shapes. The CAF module is added into the Pointnet + +, the optimal classification precision is 90.7% when the Pointnet + + is reproduced, the average test precision reaches 91.0% after the CAF module is added, and the result proves the effectiveness and feasibility of the CAF module in maintaining and improving the classification precision. Adding a CAF module in the PointASNL, and when only coordinate points are input, enabling the classification precision to reach 92.9 percent (92.88 percent) and be not lower than 92.9 percent (92.85 percent of the actual test optimal classification precision) in the PointASNL; when the normal vector is added in training and testing, the classification precision reaches 93.2 percent (93.19 percent) and is not lower than 93.2 percent in PointASNL (the optimal classification precision is 93.15 percent in actual testing). The experimental result proves the independence and the mobility of the CAF module and helps to maintain the classification precision.

Experiment 2: and (5) carrying out robustness analysis.

Adding Gaussian noise simulation disturbance to the point cloud, and adopting standard normal distribution; random noise is added to the point cloud to simulate outliers, and the noise range is in the range of [ -1.0,1.0 ]. The ability of the CAF module to resist disturbances (gauss) and outliers (random) was tested using PointASNL as the reference network (Base). The result is shown in fig. 5, and after the CAF module is added, the anti-interference performance of the model on two noise types, namely point cloud disturbance and outlier, is obviously improved.

And replacing a certain number of real point sets with random noise within the range of [ -1.0,1.0], and simulating the situation of simultaneous data loss and noise interference, wherein the number of the random noise is [0,1,10,50,100 ]. Fig. 6(a) shows the classification accuracy of the pointent + + network added with the CAF module and the original network on the test set with data loss and random noise, and as the amount of noise increases, the network classification accuracy added with the CAF module decreases more slowly, and the robustness of the model is significantly improved. Fig. 6(b) shows the classification accuracy of the PointASNL network with the CAF module on the test set with data loss and random noise, and for the model with 1024 point training input, under the same condition, after the CAF module is added, the network anti-interference capability is improved under the condition of different amounts of data loss and random noise, and the better anti-interference capability can be obtained by adding the number of input point clouds to 2048 and 3000 while maintaining the stable classification performance. Fig. 6(c) shows the ultimate immunity of the CAF module to interference on PointASNL.

It should be noted that the above-mentioned embodiments are only examples for clearly illustrating the present invention, and are not limitations of the embodiments, and all embodiments cannot be exhaustive here. All parts not specified in the present embodiment can be realized by using the prior art. It will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. A depth three-dimensional point cloud classification network construction method based on competitive attention fusion is characterized by comprising the following steps:

step 1: preprocessing original point cloud data;

2. The competitive attention fusion-based deep three-dimensional point cloud classification network construction method according to claim 1, characterized in that: in the step 1, in the process of preprocessing the original point cloud data, B samples are processed in parallel in batches, and N original point cloud data of each sample are preprocessed₀Sampling result P of individual point cloud data_Sample。

3. The competitive attention fusion-based deep three-dimensional point cloud classification network construction method according to claim 1, characterized in that: the competitive attention fusion feature abstraction layer in the step 2 consists of a feature extraction layer and a CAF module, wherein the feature extraction layer receives input data D from the competitive attention fusion feature abstraction layer_inExtracting high-dimensional characteristics F of input data through multilayer convolution network_extTo input data D_inAnd high dimensional feature F_extThe two are taken as the input of a CAF module, and feature fusion is carried out in the CAF module;

the CAF module comprises an MFSE sub-module and a FICSA sub-module:

And high dimensional features

Separately performing pooling and encoding operations, wherein

The number is a real number set,

(N₃1 is F_MFSE-inPoint cloud number of C₃＝C₁R is F_MFSE-inCharacteristic number of channels) and

(N₄1 is F_MFSE-extPoint cloud number of C₄＝C₂R is F_MFSE-extThe number of characteristic channels) as follows:

N₅1 is F_MFSE-ConcatPoint cloud number of C₅＝(C₁+C₂) R is F_MFSE-ConcatThe formula of the characteristic channel number is as follows:

wherein

For full-link layers containing a normalization function SigmoidIn the process of exhibition,

the global attention weight finally obtained by the MFSE submodule;

F_FICSA＝γ(A(F_ext)V(F_ext)) (6)

wherein, V (-) is used for adjusting the dimension of the characteristic channel of A (-) and using the characteristic as the FICSA sub-moldFinal output of block

F_CAF＝αF_MFSE+βF_FICSA (7)

Obtaining output of CAF module by weight redistribution and residual connection

F_Fusion＝F_ext+F_CAFF_ext (8)

4. The competitive attention fusion-based deep three-dimensional point cloud classification network construction method according to claim 1, characterized in that: the specific method for stacking two competitive attention fusion feature abstraction layers in the step 3 is as follows: the sampling result P in the step 1 is processed_SampleSending the data as input into a first layer competitive attention fusion feature abstract layer to obtain a fused feature F_Fusion-Mid(ii) a The fused feature F_Fusion-MidThen as input, sending the data into a second layer competitive attention fusion feature abstract layer to obtain the final fusion feature F_Fusion-Final。

5. The competitive attention fusion-based deep three-dimensional point cloud classification network construction method according to claim 1, characterized in that: the specific method for sending the high-dimensional features finally output by the second layer of competitive attention fusion feature abstraction layer to the classifier in the step 4 is to introduce a multilayer perceptron (MLP) as the classifier after the second layer of competitive attention fusion feature abstraction layer, and perform classification learning on the input point cloud fusion features to obtain the classification scores.