CN116824143A - Point cloud segmentation method based on bilateral feature fusion and vector self-attention - Google Patents
Point cloud segmentation method based on bilateral feature fusion and vector self-attention Download PDFInfo
- Publication number
- CN116824143A CN116824143A CN202310780811.7A CN202310780811A CN116824143A CN 116824143 A CN116824143 A CN 116824143A CN 202310780811 A CN202310780811 A CN 202310780811A CN 116824143 A CN116824143 A CN 116824143A
- Authority
- CN
- China
- Prior art keywords
- point cloud
- feature
- attention
- information
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 239000013598 vector Substances 0.000 title claims abstract description 39
- 230000011218 segmentation Effects 0.000 title claims abstract description 36
- 230000002146 bilateral effect Effects 0.000 title claims abstract description 25
- 230000004927 fusion Effects 0.000 title claims abstract description 22
- 238000000605 extraction Methods 0.000 claims abstract description 26
- 238000005070 sampling Methods 0.000 claims abstract description 21
- 230000002776 aggregation Effects 0.000 claims abstract description 15
- 238000004220 aggregation Methods 0.000 claims abstract description 15
- 238000004364 calculation method Methods 0.000 claims abstract description 13
- 230000000694 effects Effects 0.000 claims abstract description 11
- 238000012937 correction Methods 0.000 claims abstract description 5
- 230000007246 mechanism Effects 0.000 claims description 20
- 239000011159 matrix material Substances 0.000 claims description 10
- 238000006116 polymerization reaction Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000000644 propagated effect Effects 0.000 claims description 3
- 238000000638 solvent extraction Methods 0.000 claims description 3
- 230000000153 supplemental effect Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a point cloud segmentation method based on bilateral feature fusion and vector self-attention, and relates to the technical field of point cloud semantic segmentation; the method comprises the following steps: inputting original point cloud data; encoding the input original point cloud data by using a bilateral feature fusion module and a vector self-attention module; and decoding the point cloud characteristics, and up-sampling the point cloud characteristics by using a continuous FP layer to obtain a point cloud segmentation result. The invention provides a high-efficiency point cloud semantic segmentation network, so that the semantic segmentation of the point cloud is quicker and more accurate, and the segmentation performance is superior; the correction information computing module based on the geometric semantic bilateral characteristic information is provided, the edge information is adjusted, the problem of edge ambiguity in local area aggregation is relieved, and the aggregation effect of the local information is enhanced; the novel offset vector self-attention module is provided, global features of the point cloud are effectively extracted, and a better global feature extraction effect is achieved on the basis of reducing network calculation amount.
Description
Technical Field
The invention relates to the technical field of point cloud semantic segmentation, in particular to a point cloud segmentation method based on bilateral feature fusion and vector self-attention, which is suitable for semantic segmentation of indoor point clouds.
Background
With the intensive research, methods for processing 3D point clouds using deep learning have been significantly successful, and these methods can be generally classified into three types: projection-based methods, voxel-based methods, and point-based methods. Among them, the point-based method, i.e., directly processing a point set using a multi-layer perceptron (MLP), has been the mainstream due to its efficiency and high performance.
In the point-based approach, pointNet is a very classical network. It uses a shared multi-layer perceptron (MLP) to extract features and consistently aggregate global features through symmetric functions, regardless of internal order. However, the Pointet adopts a single-point sampling mode, and this method cannot effectively extract local features. The multi-level feature extraction method effectively solves the problems in the prior art by sampling and grouping operations proposed by Pointnet++ on the basis of Pointnet. However, the local feature extraction of the Pointet++ by the grouping method can cause the problem of edge ambiguity of local areas, and in the neighborhood construction process, abnormal values and overlapping among the neighborhood are difficult to avoid, and in the intersection area of multiple semantic classifications, the abnormal values and overlapping are more prominent. And the aggregation area divided by taking Euclidean distance as a standard can not be well adapted to semantic features in the local range of the semantic space, so that PointNet++ focuses on the extraction of geometric information, the aggregation effect on the local feature information is insufficient, the extraction capacity for semantic extraction is weaker, and the global extraction is insufficient by adopting FPS only. Compared with Pintnet++, the recently proposed PointNeXt is focused on training skills and scale strategies to further improve the performance of PointNet++; the PointMLP achieves very high classification performance without any complex local feature extractor design by introducing a residual MLP structure. However, these methods focus on feature extraction in geometric space, and the problems of PointNet++ edge ambiguity and insufficient global feature extraction are not well solved. There are also some methods focusing on semantic feature extraction, DGCNN proposes an edge convolution (EdgeConv) for learning edge features, by constructing a local neighborhood graph and performing the EdgeConv operation on each adjacent edge, dynamically updating the graph structure between levels. AdaptiveGraph proposes to assign learning weights on each edge to better evaluate and aggregate information. The methods also adopt similar grouping methods, have the problem of edge ambiguity similar to PointNet++, and have the problem of geometric structure deficiency in a high-dimensional semantic space due to feature extraction focusing on the semantic space. With the adoption of the self-attention mechanism which has great success in natural language processing and two-dimensional image processing tasks, the self-attention mechanism is also used for processing the three-dimensional point cloud, and the attention mechanism has strong extraction capability on global features, but has the problem of large calculation amount.
It can be seen that, because most of the point cloud semantic segmentation networks currently extract local features, the point clouds are grouped and aggregated. The problem of ambiguity of neighbor edges in a packet is difficult to resolve. And the aggregation area divided by taking Euclidean distance as a standard can not be well adapted to semantic features in the local scope of the semantic space, and the geometrical structure is deleted in the high-dimensional semantic space, so that the aggregation effect of the local feature information is insufficient. The method of constructing multi-scale feature extraction by downsampling only loses much detail information, and global features cannot be fully extracted. How to solve the problems becomes a technical problem to be solved in the prior art.
Disclosure of Invention
Based on the technical problems in the background art, the invention provides a point cloud segmentation method based on bilateral feature fusion and vector self-attention, which improves the robustness and accuracy of point cloud scene semantic segmentation, improves the feature learning capability of a semantic segmentation network and relieves the problems of edge ambiguity of local region features and insufficient global feature representativeness.
The technical scheme adopted by the invention is as follows:
a point cloud segmentation method based on bilateral feature fusion and vector self-attention comprises the following steps:
s1: inputting original point cloud data;
s2: encoding the input original point cloud data by using a bilateral feature fusion module and a vector self-attention module;
s3: and decoding the point cloud characteristics, and up-sampling the point cloud characteristics by using a continuous FP layer to obtain a point cloud segmentation result.
Further, the input source point cloud data in step S1 specifically includes:
taking S3DIS as the indoor data set of the test, S3DIS is a large indoor scene segmentation data set and comprises 13 categories and 271 rooms. Each point cloud data has 9 features, namely color information R, G, B, coordinate information x, y, z, and 3 normal vectors. 271 rooms are divided into 6 areas, each room being divided into 1 m x 1 m blocks. Setting an input point cloud position F in Its dimension is [ B, N,9]Wherein B is a batch, N is the number of points, 9 is a feature, and the total number of input features is B.times.N.times.9.
Further, in the step S2, the encoding of the input original point cloud data using the bilateral feature fusion module and the vector self-attention module specifically includes:
dividing the input point cloud into a geometric space comprises coordinate information of the point cloud, and the dimension [ B, N,3 ] of the coordinate information]The semantic space contains color information and normal vectors of the point cloud, and the dimensions of the color information and the normal vectors are [ B, N,6]The semantic space portion is subjected to mlp operation, converted into semantic space and then respectively fed into an encoder SA, and sampling points p are generated by using FPS on the original data i The corresponding characteristic information represents f i Grouping the point clouds by a ball query method under the measurement of three-dimensional Euclidean distance by taking the sampling points as the centers, wherein the ball query method is to set a radius r to find the points of a spherical range taking the FPS sampling points as the centers and taking r as the radius, and the points are called as a center point p i Neighbor point p of (2) j ,And addAdding an SA layer and a VA layer to extract point cloud characteristics; the SA layer is used for representing bilateral feature information fusion and local feature extraction of the point cloud, and the VA layer is used for completing global feature extraction of the point cloud for the improved self-attention layer;
at SA layer, the absolute position of the center point and the relative position of the neighborhood thereof are combined into a geometric spatial local feature G (p i ,p j )=[p i ;p j -p i ]The method comprises the steps of carrying out a first treatment on the surface of the Likewise, S (f) i ,f j )=[f i ;f j -f i ]Representing local features in semantic space; the geometrical information is converted into semantic space by adopting MLP, the geometrical information is converted into a feature mask by softmax function, and the edge feature information of the semantic space is corrected; likewise, the same method is adopted to dynamically adjust the geometric edge information for the geometric space; meanwhile, the adjusted edge information is added with the original edge information, and a residual error structure is established to ensure the robustness of information optimization; the calculation of this correction information can be formulated as:
f s =Softmax(mlp(G(p i ,p j )))*S e +S e (1)
p s =Softmax(mlp(S(f i ,f j )))*G e +G e (2)
wherein G is e (pi,pj)=[p i ;p j -p i ],S e (f i ,f j )=[f i ;f j -f i ],p i ,p j ,p s ∈R N×3 ,f i ,f j ,f s ∈R N ×d .
The obtained supplemental information is then combined with p s ,f s And G (p) i ,p j ),S(f i ,f j ) Combining to obtain G' = [ p ] i ;p j -p i ;p s ]Wherein S' = [ f i ;f j -f i ;f s ],G’∈R N×9 ,S’∈R N×3×d Then get them concatTo enhanced local feature information F c The obtained local characteristic information F c Sending into two LBR layers for feature extraction to obtain F c ' finally, the obtained feature F c ' sending the partial region into a maximum pooling layer to complete the feature polymerization of the partial region to obtain a partial polymerization feature F a The method comprises the steps of carrying out a first treatment on the surface of the In obtaining local polymerization characteristics F a Then, the obtained local area aggregation information is sent into the VA layer to carry out global feature extraction;
in the traditional attention mechanism, the attention mechanism is calculated in the point cloud according to the following steps: first, embed feature F a Is fed to three separable convolutions of kernel size 1 x 1 to produce three new feature maps F q 、F k And F v The method comprises the steps of carrying out a first treatment on the surface of the Subsequently, a new feature map F q Will be transposed and multiplied by the new feature map F k Generating an attention matrix of size n×n after the softmax layer:
the method of the vector attention mechanism is introduced into the calculation method of the traditional attention mechanism, the attention matrix is generated by subtraction and two linear layers through a softmax function, and the calculation formula is as follows
Attention(F Q ,F K ,F V )=Softmax(γ(F Q -F K ))F V (4)
Where γ is a mapping function that generates an attention vector for feature aggregation, the attention matrix F to be obtained A And input feature F a Point riding results in a attention feature F v An offset attention mechanism is designed, expressed by the following formula:
F out =Relu(Batchnorm(Mlp(F a -F A )))+F a (5)
the characteristic information F is obtained out 。
Further, the decoding of the point cloud feature in step S3 specifically includes:
the decoding part adopts a continuous FP layer to carry out up-sampling on the point cloud characteristics so as to achieve the effect of partitioning the point cloud; adopting interpolation based on distance and hierarchical propagation strategy crossing skip links; in the feature propagation level, point features are separated from N l Propagation of x (d+C) point to N l-1 Points, where N l-1 And N l (having N l ≤N l-1 ) Is the input and output point set size of the first SA layer; by at N l-1 Interpolation N at the coordinates of a point l Feature propagation is realized by the feature values f of the points; among the numerous choices of interpolation, the inverse distance weighted average based on k nearest neighbors is used; then N l-1 Interpolation features at the points are connected with skip link point features from the set abstraction level; then, the connected features are passed through the "Pointnet Unit" and the shared full connection and ReLU layer is applied to update the feature vector for each point; repeating the process until the feature is propagated to the original set of points;
and finally, obtaining a segmentation result of the point cloud.
The invention provides a point cloud segmentation method based on bilateral feature fusion and an attention mechanism, which at least comprises the following beneficial effects compared with the prior art:
1) The invention provides a high-efficiency point cloud semantic segmentation network, so that the semantic segmentation of the point cloud is quicker and more accurate, and the segmentation performance is superior.
2) The invention provides a group of correction information calculation modules based on geometric semantic bilateral characteristic information, which adjusts edge information, relieves the problem of edge ambiguity in local area aggregation and strengthens the aggregation effect of local information.
3) The invention provides a new offset vector self-attention module, which effectively extracts the global features of the point cloud and obtains better global feature extraction effect on the basis of reducing the network calculation amount.
Drawings
FIG. 1 is a process schematic diagram of a bilateral feature aggregation module;
FIG. 2 is a process schematic of the attention module;
FIG. 3 is a schematic diagram of a point cloud semantic segmentation network process based on bilateral feature fusion and attention;
FIG. 4 is a point cloud semantic segmentation effect diagram.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.
Example 1.
As shown in fig. 1-4, a point cloud segmentation method based on bilateral feature fusion and vector self-attention, the method comprises the following steps:
s1: inputting original point cloud data
Taking S3DIS as the indoor data set of the test, S3DIS is a large indoor scene segmentation data set and comprises 13 categories and 271 rooms. Each point cloud data has 9 features, namely color information R, G, B, coordinate information x, y, z, and 3 normal vectors. 271 rooms are divided into 6 areas, each room being divided into 1 m x 1 m blocks. Setting an input point cloud position F in Its dimension is [ B, N,9]Wherein B is a batch, N is the number of points, 9 is a feature, and the total number of input features is B x N x 9;
s2: encoding input original point cloud data by using bilateral feature fusion module and vector self-attention module
Dividing the input point cloud into a geometric space comprises coordinate information of the point cloud, and the dimension [ B, N,3 ] of the coordinate information]The semantic space contains color information and normal vectors of the point cloud, and the dimensions of the color information and the normal vectors are [ B, N,6]The semantic space portion is then subjected to mlp operations, converted into semantic space and fed into the encoder SA, respectively, by first generating sampling points p using the FPS on the raw data i The corresponding characteristic information represents f i Grouping the point clouds by a ball query method under the measurement of three-dimensional Euclidean distance by taking the sampling points as the centers, wherein the ball query method is to set a radius r to find the points of a spherical range taking the FPS sampling points as the centers and taking r as the radius, and the points are called as a center point p i Neighbor point p of (2) j ,In the model, continuous 4-layer FPS sampling is adopted to construct point cloud space sampling points with different scales as N/4, N/16, N/64 and N/128, 0.1,0.2,0.4,0.8 is adopted as grouping radius point cloud for grouping in each layer, and SA layer and VA layer are added to extract point cloud characteristics. The SA layer is used for representing bilateral feature information fusion and local feature extraction of the point cloud, and the VA layer is used for completing global feature extraction of the point cloud for the improved self-attention layer.
At SA layer, the absolute position of the center point and the relative position of the neighborhood thereof are combined into a geometric spatial local feature G (p i ,p j )=[p i ;p j -p i ]. Likewise, S (f) i ,f j )=[f i ;f j -f i ]Representing local features in semantic space. The geometrical information is converted into semantic space by adopting MLP, the geometrical information is converted into a feature mask by softmax function, and the edge feature information of the semantic space is corrected. Similarly, the same method is used for dynamically adjusting the geometric edge information with respect to the geometric space. And meanwhile, the adjusted edge information is added with the original edge information, so that a residual error structure is established to ensure the robustness of information optimization. The calculation of this correction information can be formulated as:
f s =Softmax(mlp(G(p i ,p j )))*S e +S e (1)
p s =Softmax(mlp(S(f i ,f j )))*G e +G e (2)
wherein G is e (p i ,p j )=[p i ;p j -p i ],S e (f i ,f j )=[f i ;f j -f i ],p i ,p j ,p s ∈R N×3 ,f i ,f j ,f s ∈R N ×d .
The obtained supplemental information is then combined with p s ,f s And G (p) i ,p j ),S(f i ,f j ) Combining to obtain G' = [ p ] i ;p j -p i ;p s ]Wherein S' = [ f i ;f j -f i ;f s ],G’∈R N×9 ,S’∈R N×3×d Then the local feature information Fc is obtained after concat, and the obtained local feature information Fc is sent into two LBR layers (linear+Batchnorm+Relu) for feature extraction to obtain F c ' finally, the obtained feature F c ' sending the partial region into a maximum pooling layer to complete the feature polymerization of the partial region to obtain a partial polymerization feature F a 。
In obtaining local polymerization characteristics F a And then, sending the obtained local area aggregation information into the VA layer to perform global feature extraction.
Before introducing the improved attention mechanism, the following conventional attention mechanism is first introduced, and the attention mechanism is calculated in the point cloud, usually in accordance with the following steps. First, embed feature F a Is fed to three separable convolutions of kernel size 1 x 1 to produce three new feature maps F q 、F k And F v . Subsequently, a new feature map F q Will be transposed and multiplied by the new feature map F k An attention matrix of size n x n is generated after the softmax layer.
The attention module is a method for introducing a vector attention mechanism in a traditional attention mechanism calculation method, and generates an attention matrix by subtraction and two linear layers through a softmax function. Unlike conventional attention computation, the attention weights in the vector attention are vectors that can modulate a single characteristic channel, and the amount of computation required to calculate the attention matrix can be reduced by subtraction instead. The calculation formula is as follows
Attention(F Q ,F K ,F V )=Softmax(γ(F Q -F K ))F V (4)
Where γ is a mapping function (e.g. MLP) that generates an attention matrix F that the attention vector for feature aggregation will get A And input feature F a Point riding results in a attention feature F v Typically, attention features are then fed into the MLP layer and input features F are added by residual links a The final output feature F is available but in order to increase the attention weight and reduce the effect of noise, an offset attention mechanism is designed which works on the principle of replacing the attention feature with an offset between the input of the self-attention module and the attention feature. Can be expressed by the following formula:
F out =Relu(Batchnorm(Mlp(F a -F A )))+F a (5)
the characteristic information F is obtained out
S3: decoding point cloud features
The decoding part adopts a continuous FP layer to carry out up-sampling on the point cloud characteristics so as to achieve the effect of partitioning the point cloud. Briefly, the FP layer is a process that aggregates features back to the original point cloud, another approach is to propagate features from sub-sampling points to the original points.
Distance-based interpolation and hierarchical propagation strategies across hop-level links are employed. In the feature propagation level, point features are separated from N l Propagation of x (d+C) point to N l-1 Points, where N l-1 And N l (having N l ≤N l-1 ) Is the point set size of the input and output of the layer i SA layer. By at N l-1 Interpolation N at the coordinates of a point l Characteristic value f of each pointNow the feature propagates. Among the numerous options for interpolation, an inverse distance weighted average based on k nearest neighbors is used (as in equation 2, p=2, k=3 is used by default). Then N l-1 Interpolation features at the points are connected with skip-link point features from the set abstraction level. The characteristics of the connection are then passed through a "Pointnet Unit", similar to a convolution by convolution in CNN. Some shared full connection and ReLU layers are applied to update the feature vector for each point. This process is repeated until the feature is propagated to the original set of points.
In the present model, up-sampling is performed on the point cloud by using FP layers of 4 continuous layers, and finally, the segmentation result of the point cloud is obtained, as shown in table 1.
TABLE 1 qualitative segmentation results
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.
Claims (4)
1. A point cloud segmentation method based on bilateral feature fusion and vector self-attention is characterized by comprising the following steps:
s1: inputting original point cloud data;
s2: encoding the input original point cloud data by using a bilateral feature fusion module and a vector self-attention module;
s3: and decoding the point cloud characteristics, and up-sampling the point cloud characteristics by using a continuous FP layer to obtain a point cloud segmentation result.
2. The point cloud segmentation method based on bilateral feature fusion and vector self-attention as claimed in claim 1, wherein the input original point cloud data in step S1 specifically comprises:
taking S3DIS as a tested indoor data set, wherein S3DIS is a large indoor scene segmentation data set and comprises 13 categories and 271 rooms; each point cloud data has 9 features, namely color information R, G, B, coordinate information x, y, z and 3 normal vectors; 271 rooms are divided into 6 areas, each room being divided into 1 m x 1 m blocks; setting an input point cloud position F in Its dimension is [ B, N,9]Wherein B is a batch, N is the number of points, 9 is a feature, and the total number of input features is B.times.N.times.9.
3. The method for point cloud segmentation based on bilateral feature fusion and vector self-attention according to claim 2, wherein the encoding of the input original point cloud data using the bilateral feature fusion module and the vector self-attention module in step S2 specifically comprises:
dividing the input point cloud into a geometric space comprises coordinate information of the point cloud, and the dimension [ B, N,3 ] of the coordinate information]The semantic space contains color information and normal vectors of the point cloud, and the dimensions of the color information and the normal vectors are [ B, N,6]The semantic space portion is subjected to mlp operation, converted into semantic space and then respectively fed into an encoder SA, and sampling points p are generated by using FPS on the original data i The corresponding characteristic information represents f i Grouping point clouds by a ball query method under the measurement of three-dimensional Euclidean distance by taking the sampling points as the centers, wherein the ball query method is to find a spherical surface taking an FPS sampling point as the center and taking r as a radius by setting a radius rPoints of the range, which are referred to as center points p i Neighbor point p of (2) j ,Adding an SA layer and a VA layer to extract point cloud characteristics; the SA layer is used for representing bilateral feature information fusion and local feature extraction of the point cloud, and the VA layer is used for completing global feature extraction of the point cloud for the improved self-attention layer;
at SA layer, the absolute position of the center point and the relative position of the neighborhood thereof are combined into a geometric spatial local feature G (p i ,p j )=[p i ;p j -p i ]The method comprises the steps of carrying out a first treatment on the surface of the Likewise, S (f) i ,f j )=[f i ;f j -f i ]Representing local features in semantic space; the geometrical information is converted into semantic space by adopting MLP, the geometrical information is converted into a feature mask by softmax function, and the edge feature information of the semantic space is corrected; likewise, the same method is adopted to dynamically adjust the geometric edge information for the geometric space; meanwhile, the adjusted edge information is added with the original edge information, and a residual error structure is established to ensure the robustness of information optimization; the calculation of this correction information can be formulated as:
f s =Softmax(mlp(G(p i ,p j )))*S e +S e (1)
p s =Softmax(mlp(S(f i ,f j )))*G e +G e (2)
wherein G is e (p i ,p j )=[p i ;p j -p i ],S e (f i ,f j )=[f i ;f j -f i ],p i ,p j ,p s ∈R N × 3 ,f i ,f j ,f s ∈R N×d .
The obtained supplemental information is then combined with p s ,f s And G (p) i ,p j ),S(f i ,f j ) Combining to obtain G' = [ p ] i ;p j -p i ;p s ]Wherein S' = [ f i ;f j -f i ;f s ],G’∈R N×9 ,S’∈R N×3×d Then they are concat to obtain enhanced local feature information F c The obtained local characteristic information F c Sending into two LBR layers for feature extraction to obtain F c ' finally, the obtained feature F c ' sending the partial region into a maximum pooling layer to complete the feature polymerization of the partial region to obtain a partial polymerization feature F a The method comprises the steps of carrying out a first treatment on the surface of the In obtaining local polymerization characteristics F a Then, the obtained local area aggregation information is sent into the VA layer to carry out global feature extraction;
in the traditional attention mechanism, the attention mechanism is calculated in the point cloud according to the following steps: first, embed feature F a Is fed to three separable convolutions of kernel size 1 x 1 to produce three new feature maps F q 、F k And F v The method comprises the steps of carrying out a first treatment on the surface of the Subsequently, a new feature map F q Will be transposed and multiplied by the new feature map F k Generating an attention matrix of size n×n after the softmax layer:
the method of the vector attention mechanism is introduced into the calculation method of the traditional attention mechanism, the attention matrix is generated by subtraction and two linear layers through a softmax function, and the calculation formula is as follows
Attention(F Q ,F K ,F V )=Softmax(γ(F Q -F K ))F V (4)
Where γ is a mapping function that generates an attention vector for feature aggregation, the attention matrix F to be obtained A And input feature F a Point riding results in a attention feature F v An offset attention mechanism is designed, expressed by the following formula:
F out =Relu(Batchnorm(Mlp(F a -F A )))+F a (5)
the characteristic information F is obtained out 。
4. The point cloud segmentation method based on bilateral feature fusion and vector self-attention as claimed in claim 3, wherein the decoding of the point cloud features in step S3 specifically comprises:
the decoding part adopts a continuous FP layer to carry out up-sampling on the point cloud characteristics so as to achieve the effect of partitioning the point cloud; adopting interpolation based on distance and hierarchical propagation strategy crossing skip links; in the feature propagation level, point features are separated from N l Propagation of x (d+C) point to N l-1 Points, where N l-1 And N l Is the point set size of the input and output of the layer I SA layer, where N l ≤N l-1 The method comprises the steps of carrying out a first treatment on the surface of the By at N l-1 Interpolation N at the coordinates of a point l Feature propagation is realized by the feature values f of the points; among the numerous choices of interpolation, the inverse distance weighted average based on k nearest neighbors is used; then N l-1 Interpolation features on the points are connected with skip link point features from the set abstraction level, and the shared full connection and ReLU layer is applied to update feature vectors of each point; repeating the process until the feature is propagated to the original set of points;
and finally, obtaining a segmentation result of the point cloud.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310780811.7A CN116824143A (en) | 2023-06-29 | 2023-06-29 | Point cloud segmentation method based on bilateral feature fusion and vector self-attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310780811.7A CN116824143A (en) | 2023-06-29 | 2023-06-29 | Point cloud segmentation method based on bilateral feature fusion and vector self-attention |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116824143A true CN116824143A (en) | 2023-09-29 |
Family
ID=88116367
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310780811.7A Pending CN116824143A (en) | 2023-06-29 | 2023-06-29 | Point cloud segmentation method based on bilateral feature fusion and vector self-attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116824143A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117278150A (en) * | 2023-11-23 | 2023-12-22 | 成都工业学院 | Indoor wireless network signal measurement and calculation method, equipment and medium |
-
2023
- 2023-06-29 CN CN202310780811.7A patent/CN116824143A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117278150A (en) * | 2023-11-23 | 2023-12-22 | 成都工业学院 | Indoor wireless network signal measurement and calculation method, equipment and medium |
CN117278150B (en) * | 2023-11-23 | 2024-02-09 | 成都工业学院 | Indoor wireless network signal measurement and calculation method, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109377530B (en) | Binocular depth estimation method based on depth neural network | |
WO2018000752A1 (en) | Monocular image depth estimation method based on multi-scale cnn and continuous crf | |
Gao et al. | LFT-Net: Local feature transformer network for point clouds analysis | |
Wang et al. | Channel and space attention neural network for image denoising | |
CN111709290B (en) | Crowd counting method based on coding and decoding-jump connection scale pyramid network | |
CN111259945A (en) | Binocular parallax estimation method introducing attention map | |
CN116824143A (en) | Point cloud segmentation method based on bilateral feature fusion and vector self-attention | |
CN113592927A (en) | Cross-domain image geometric registration method guided by structural information | |
CN114548265A (en) | Crop leaf disease image generation model training method, crop leaf disease identification method, electronic device and storage medium | |
CN114638842B (en) | Medical image segmentation method based on MLP | |
CN115311502A (en) | Remote sensing image small sample scene classification method based on multi-scale double-flow architecture | |
CN116563682A (en) | Attention scheme and strip convolution semantic line detection method based on depth Hough network | |
CN112967296A (en) | Point cloud dynamic region graph convolution method, classification method and segmentation method | |
CN116844004A (en) | Point cloud automatic semantic modeling method for digital twin scene | |
CN116129118B (en) | Urban scene laser LiDAR point cloud semantic segmentation method based on graph convolution | |
CN116485892A (en) | Six-degree-of-freedom pose estimation method for weak texture object | |
CN116091823A (en) | Single-feature anchor-frame-free target detection method based on fast grouping residual error module | |
CN113808006B (en) | Method and device for reconstructing three-dimensional grid model based on two-dimensional image | |
CN113822175B (en) | Virtual fitting image generation method based on key point clustering driving matching | |
CN112990336B (en) | Deep three-dimensional point cloud classification network construction method based on competitive attention fusion | |
CN115115860A (en) | Image feature point detection matching network based on deep learning | |
CN117333682A (en) | Multi-view three-dimensional reconstruction method based on self-attention mechanism | |
CN114022362A (en) | Image super-resolution method based on pyramid attention mechanism and symmetric network | |
CN113239771A (en) | Attitude estimation method, system and application thereof | |
CN111833390A (en) | Light field depth estimation method based on unsupervised depth learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |