CN114358246A

CN114358246A - Graph convolution neural network module of attention mechanism of three-dimensional point cloud scene

Info

Publication number: CN114358246A
Application number: CN202111618088.XA
Authority: CN
Inventors: 景维鹏; 张文钧; 李林辉; 陈广胜
Original assignee: Northeast Forestry University
Current assignee: Northeast Forestry University
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2022-04-15

Abstract

The invention discloses a graph convolution neural network module of an attention mechanism of a three-dimensional point cloud scene, which comprises the following components: an attention map coding module, namely an AGEM module and an attention pooling module, namely an AP module; the defects of poor local feature extraction capability of the existing model and poor feature polymerization capability of the DGCNN model are overcome.

Description

Graph convolution neural network module of attention mechanism of three-dimensional point cloud scene

Technical Field

The invention relates to the field of point cloud data, in particular to a graph convolution neural network module of an attention mechanism of a three-dimensional point cloud scene.

Background

The point cloud is a collection of some discrete points in a three-dimensional space, and compared with a common remote sensing image, the point cloud data has more spatial information. Therefore, the method has important value for tasks such as surface monitoring and the like, and the research of the three-dimensional point cloud data is widely applied to the fields such as society and the like. The method mainly comprises road segmentation, 3D city modeling, automatic driving, face recognition, forest monitoring and the like. The research of the three-dimensional point cloud data mainly focuses on three-dimensional information and depth features carried by the excavation point cloud data. The scientific research history of the three-dimensional point cloud data is long, and the hard science and the social progress are promoted widely from artificial geometry to deep learning. Three-dimensional point cloud data has more spatial information than traditional images, which presents more challenges and opportunities in the point cloud field. Meanwhile, due to the successful application of the Convolutional Neural Network (CNN) in the aspects of image classification, target detection, semantic segmentation and the like, a deep learning method is developed. Meanwhile, inspired by the excellent results, the point cloud research is also turned to a flexible neural network structure from the traditional machine learning, and the point cloud data is analyzed from the perspective of deep learning to be applied to actual industry and commerce.

Prior art 1

Hang Su et al process three-dimensional point cloud data by projecting the point cloud into two dimensions. And the existing two-dimensional image processing method is utilized to carry out tasks such as classification and segmentation on the data. Charles et al, derived from mathematical theory, first propose to process point cloud data using symmetric functions to satisfy the invariance of point cloud data. Daniel Maturana et al apply a deep learning method such as CNN by voxelizing the point cloud. Generally, early three-dimensional point cloud data research mainly utilizes theoretical knowledge of various disciplines such as geometry to analyze shallow information. The most advanced methods always imply deep learning represented by Convolutional Neural Networks (CNN). In addition, the method achieves huge achievement and outstanding expression in the aspects of semantic segmentation, classification, target detection and the like.

Undeniably, the role of CNN in deep learning is likely to be de facto standard. However, its parameters grow exponentially with the increase of the convolutional layer, and its size increases with the increase of the computing power, and the projection and the voxelization usually bring huge memory occupation and computation consumption. In addition, due to the persistence of multiplication and addition operations, computational consumption is a bottleneck for industrial applications and cannot meet the real-time requirements of the industry.

The second prior art is:

SEGCloud processes by dividing the overall point cloud data into several small point clouds and applying trilinear interpolation and conditional random fields. Charles et al propose a method of gradually enlarging the receptive field by improving the PointNet network, and also improve the problem of slightly larger calculation amount. Recently, with the benefit of the successful expansion of maps and other non-linear structures in the field of deep learning, the atlas neural network has gained the most advanced performance in computer vision and has attracted the attention of many researchers. Inspired by this, AdaptConv uses a dynamic convolution kernel to make the convolution operation more flexible. The 3D-GCN also designs a learnable convolution kernel to acquire local features and shows good learning ability. DGCNN provides a convolution method named as EdgeConv, which can dynamically calculate the graph structure of each network layer and aggregate the characteristics of the central node in the local graph and the corresponding edge characteristics, wherein the graph structure is obtained by a K-NN method, and the local characteristics are well extracted by the method. With the success of attention in the field of natural language processing, more and more people have come to apply it in the field of computer vision. The GAPNet, GACNet and LAE-Conv are all designed with attention modules to obtain point cloud features. These graph structure-based point cloud processing methods and attention mechanism-based point cloud processing methods use a maximum pooling approach to aggregate the features of local feature maps.

The second prior art has the defects

The above methods all use a simple max pooling (max pooling) strategy to aggregate local feature information. Therefore, it results in many disadvantages, such as that important information is filtered and that valid and invalid features cannot be clearly distinguished, etc. The maximum value pooling selects the largest data in all the features directly, but other data does not contribute to feature extraction in the calculation process, so that much useful information is usually discarded by the maximum value pooling strategy.

Aiming at the defect of poor local feature extraction capability of the existing model, the invention provides a graph convolution neural network module based on an attention mechanism, which uses a K-NN algorithm to obtain K adjacent points of a central point and sequentially constructs a local feature graph structure. The local feature graph structure can extract the local topological structure of the point cloud data so as to better represent local features, thereby well solving the defects of the existing model.

Aiming at the defect of poor characteristic aggregation capability of a DGCNN model, the invention provides a graph convolution neural network based on an attention pooling strategy. The method comprises the steps of obtaining a plurality of adjacent points of each central point through a K-NN algorithm, calculating different attention weights, and calculating and extracting the local most important features of current input data through the weights.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a graph convolution neural network module of an attention mechanism of a three-dimensional point cloud scene, and solves the defects of poor local feature extraction capability and poor feature aggregation capability of a DGCNN model of the conventional model.

The technical scheme provided by the invention is as follows:

a graph convolution neural network module of an attention mechanism of a three-dimensional point cloud scene comprises: an attention map encoding module, namely an AGEM module, and an attention pooling module, namely an AP module.

Preferably, the AGEM module comprises the following steps:

s1: firstly, acquiring K nearest points of a central point by using a K-NN algorithm through a given K value, and forming a local point cloud set;

s2: then, the input point cloud characteristics are promoted to be the same as the k adjacent point set through Repeat operation;

s3: coding the k adjacent point set and the original point set to obtain high-dimensional characteristics;

s4: and splicing the acquired high-dimensional features, and transmitting the high-dimensional features serving as input features into the AP module.

Preferably, the AP module includes: attention weight calculation, attention weight mask, MPL module.

The graph convolution neural network module of the attention mechanism of the three-dimensional point cloud scene has the following beneficial effects:

1. the invention realizes a high-efficiency point cloud classification and segmentation method, which exceeds the prior method, thereby having certain commercial value.

2. For the semantic segmentation task, the size of the model is only 2.03M, and the method can be well suitable for industrial requirements.

3. The method can be well applied to the field of point cloud, such as land change analysis, city modeling, road segmentation and the like.

4. The influence of forest transition and terrain on forest dynamic change is monitored, and meanwhile, forest tree species can be classified.

Drawings

FIG. 1 is a schematic diagram of a convolution module for AGM attention.

FIG. 2 illustrates an AP attention pooling module of the present invention.

FIG. 3 is a data flow diagram of the present invention.

FIG. 4 is a visualization of the results of the method of the invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

The invention mainly comprises Numpy, Pandas, Tensor and the like through the existing deep learning frame Pythrch and a corresponding programming library. The Pytrch mainly uses a deep learning model, and comprises a linear module, a convolution module, a parameter penalty module and the like.

The specific scheme is realized according to the following principle:

we convert the segmentation into a classification task in the proposed method, performing pixel-level classification instead of patch segmentation, as shown in fig. 1. The main approach is AGM, which tries a convolution module. The method comprises an AGEM (attention map coding) module and an AP, attention pooling module. For the AGEM module, it meets the attention calculation needs of the AP module by making characteristic changes to the data. Specifically, firstly, using a K-NN algorithm, K nearest points of a central point are obtained through a given K value, and a local point cloud set is formed. The input point cloud features are then lifted to the same size as the k-neighbor set by the Repeat operation. Next, the k neighboring point set and the original point set are encoded to obtain high-dimensional features, such as relative distance, relative coordinates, and the like. And finally, splicing the acquired high-dimensional features, and transmitting the high-dimensional features serving as input features into the AP module.

For the AP module, it is composed of an attention weight calculation, an attention weight mask, and an MPL module, respectively. As shown in fig. 2

The data flow diagram 3 of this patent is as follows;

finally, the present invention has performed extensive experiments on three widely adopted public data sets: a ModelNet40 dataset for target classification, a shapenet part dataset for part segmentation, and an S3DIS dataset for semantic segmentation. AGNet is comprehensively superior to the most advanced method, and in a ModelNet40 data set, compared with PointNet, the accuracy is improved by 4.2%, the ECC is improved by 6.0%, the VoxNet is improved by 7.5%, and the 3DShapeNet is improved by 8.7%.

The results of the patented method are visualized as shown in fig. 4:

the architecture of the network is implemented as an important protection point, as shown in fig. 1, the network is composed of a single-layer MLP transform, an AGEM (attention-seeking convolutional coding module) and an AP (attention-pooling module), and the specific protection technology is as follows:

single layer MLP transform

Because the dimensions of input clean data are different, and the channels have great information redundancy, a part of dimension information is adjusted by one layer of MLP and simultaneously the input of a downstream model is matched

Local feature aggregation module

Obtaining data after a certain degree of change, obtaining K nearest points by using a K-NN algorithm for point cloud characteristics, obtaining high-dimensional characteristics such as relative distance by using semantic information of the searched adjacent points, and sequentially constructing local characteristic graphs to finish aggregation of local characteristic information.

Attention pooling module

And screening and pooling the obtained aggregated features in channel dimensions, calculating the weights of different features by using an attention mechanism, and acquiring important information in the features.

Output of

The most common linear full-link layer is used for inputting the final classification result, the part is a public technology, and the key protection technology of the method is not included.

Claims

1. A graph convolution neural network module of an attention mechanism of a three-dimensional point cloud scene is characterized by comprising: an attention map encoding module, namely an AGEM module, and an attention pooling module, namely an AP module.

2. The charting neural network module of the attention mechanism of the three-dimensional point cloud scene of claim 1, wherein the AGEM module comprises the steps of:

3. The charting neural network module for an attention mechanism of a three-dimensional point cloud scene of claim 1, wherein the AP module comprises: attention weight calculation, attention weight mask, MPL module.