CN116824143A - Point cloud segmentation method based on bilateral feature fusion and vector self-attention - Google Patents

Point cloud segmentation method based on bilateral feature fusion and vector self-attention Download PDF

Info

Publication number
CN116824143A
CN116824143A CN202310780811.7A CN202310780811A CN116824143A CN 116824143 A CN116824143 A CN 116824143A CN 202310780811 A CN202310780811 A CN 202310780811A CN 116824143 A CN116824143 A CN 116824143A
Authority
CN
China
Prior art keywords
point cloud
feature
attention
information
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310780811.7A
Other languages
Chinese (zh)
Inventor
胡海兵
刘泓淳
冯鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202310780811.7A priority Critical patent/CN116824143A/en
Publication of CN116824143A publication Critical patent/CN116824143A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a point cloud segmentation method based on bilateral feature fusion and vector self-attention, and relates to the technical field of point cloud semantic segmentation; the method comprises the following steps: inputting original point cloud data; encoding the input original point cloud data by using a bilateral feature fusion module and a vector self-attention module; and decoding the point cloud characteristics, and up-sampling the point cloud characteristics by using a continuous FP layer to obtain a point cloud segmentation result. The invention provides a high-efficiency point cloud semantic segmentation network, so that the semantic segmentation of the point cloud is quicker and more accurate, and the segmentation performance is superior; the correction information computing module based on the geometric semantic bilateral characteristic information is provided, the edge information is adjusted, the problem of edge ambiguity in local area aggregation is relieved, and the aggregation effect of the local information is enhanced; the novel offset vector self-attention module is provided, global features of the point cloud are effectively extracted, and a better global feature extraction effect is achieved on the basis of reducing network calculation amount.

Description

Point cloud segmentation method based on bilateral feature fusion and vector self-attention
Technical Field
The invention relates to the technical field of point cloud semantic segmentation, in particular to a point cloud segmentation method based on bilateral feature fusion and vector self-attention, which is suitable for semantic segmentation of indoor point clouds.
Background
With the intensive research, methods for processing 3D point clouds using deep learning have been significantly successful, and these methods can be generally classified into three types: projection-based methods, voxel-based methods, and point-based methods. Among them, the point-based method, i.e., directly processing a point set using a multi-layer perceptron (MLP), has been the mainstream due to its efficiency and high performance.
In the point-based approach, pointNet is a very classical network. It uses a shared multi-layer perceptron (MLP) to extract features and consistently aggregate global features through symmetric functions, regardless of internal order. However, the Pointet adopts a single-point sampling mode, and this method cannot effectively extract local features. The multi-level feature extraction method effectively solves the problems in the prior art by sampling and grouping operations proposed by Pointnet++ on the basis of Pointnet. However, the local feature extraction of the Pointet++ by the grouping method can cause the problem of edge ambiguity of local areas, and in the neighborhood construction process, abnormal values and overlapping among the neighborhood are difficult to avoid, and in the intersection area of multiple semantic classifications, the abnormal values and overlapping are more prominent. And the aggregation area divided by taking Euclidean distance as a standard can not be well adapted to semantic features in the local range of the semantic space, so that PointNet++ focuses on the extraction of geometric information, the aggregation effect on the local feature information is insufficient, the extraction capacity for semantic extraction is weaker, and the global extraction is insufficient by adopting FPS only. Compared with Pintnet++, the recently proposed PointNeXt is focused on training skills and scale strategies to further improve the performance of PointNet++; the PointMLP achieves very high classification performance without any complex local feature extractor design by introducing a residual MLP structure. However, these methods focus on feature extraction in geometric space, and the problems of PointNet++ edge ambiguity and insufficient global feature extraction are not well solved. There are also some methods focusing on semantic feature extraction, DGCNN proposes an edge convolution (EdgeConv) for learning edge features, by constructing a local neighborhood graph and performing the EdgeConv operation on each adjacent edge, dynamically updating the graph structure between levels. AdaptiveGraph proposes to assign learning weights on each edge to better evaluate and aggregate information. The methods also adopt similar grouping methods, have the problem of edge ambiguity similar to PointNet++, and have the problem of geometric structure deficiency in a high-dimensional semantic space due to feature extraction focusing on the semantic space. With the adoption of the self-attention mechanism which has great success in natural language processing and two-dimensional image processing tasks, the self-attention mechanism is also used for processing the three-dimensional point cloud, and the attention mechanism has strong extraction capability on global features, but has the problem of large calculation amount.
It can be seen that, because most of the point cloud semantic segmentation networks currently extract local features, the point clouds are grouped and aggregated. The problem of ambiguity of neighbor edges in a packet is difficult to resolve. And the aggregation area divided by taking Euclidean distance as a standard can not be well adapted to semantic features in the local scope of the semantic space, and the geometrical structure is deleted in the high-dimensional semantic space, so that the aggregation effect of the local feature information is insufficient. The method of constructing multi-scale feature extraction by downsampling only loses much detail information, and global features cannot be fully extracted. How to solve the problems becomes a technical problem to be solved in the prior art.
Disclosure of Invention
Based on the technical problems in the background art, the invention provides a point cloud segmentation method based on bilateral feature fusion and vector self-attention, which improves the robustness and accuracy of point cloud scene semantic segmentation, improves the feature learning capability of a semantic segmentation network and relieves the problems of edge ambiguity of local region features and insufficient global feature representativeness.
The technical scheme adopted by the invention is as follows:
a point cloud segmentation method based on bilateral feature fusion and vector self-attention comprises the following steps:
s1: inputting original point cloud data;
s2: encoding the input original point cloud data by using a bilateral feature fusion module and a vector self-attention module;
s3: and decoding the point cloud characteristics, and up-sampling the point cloud characteristics by using a continuous FP layer to obtain a point cloud segmentation result.
Further, the input source point cloud data in step S1 specifically includes:
taking S3DIS as the indoor data set of the test, S3DIS is a large indoor scene segmentation data set and comprises 13 categories and 271 rooms. Each point cloud data has 9 features, namely color information R, G, B, coordinate information x, y, z, and 3 normal vectors. 271 rooms are divided into 6 areas, each room being divided into 1 m x 1 m blocks. Setting an input point cloud position F in Its dimension is [ B, N,9]Wherein B is a batch, N is the number of points, 9 is a feature, and the total number of input features is B.times.N.times.9.
Further, in the step S2, the encoding of the input original point cloud data using the bilateral feature fusion module and the vector self-attention module specifically includes:
dividing the input point cloud into a geometric space comprises coordinate information of the point cloud, and the dimension [ B, N,3 ] of the coordinate information]The semantic space contains color information and normal vectors of the point cloud, and the dimensions of the color information and the normal vectors are [ B, N,6]The semantic space portion is subjected to mlp operation, converted into semantic space and then respectively fed into an encoder SA, and sampling points p are generated by using FPS on the original data i The corresponding characteristic information represents f i Grouping the point clouds by a ball query method under the measurement of three-dimensional Euclidean distance by taking the sampling points as the centers, wherein the ball query method is to set a radius r to find the points of a spherical range taking the FPS sampling points as the centers and taking r as the radius, and the points are called as a center point p i Neighbor point p of (2) jAnd addAdding an SA layer and a VA layer to extract point cloud characteristics; the SA layer is used for representing bilateral feature information fusion and local feature extraction of the point cloud, and the VA layer is used for completing global feature extraction of the point cloud for the improved self-attention layer;
at SA layer, the absolute position of the center point and the relative position of the neighborhood thereof are combined into a geometric spatial local feature G (p i ,p j )=[p i ;p j -p i ]The method comprises the steps of carrying out a first treatment on the surface of the Likewise, S (f) i ,f j )=[f i ;f j -f i ]Representing local features in semantic space; the geometrical information is converted into semantic space by adopting MLP, the geometrical information is converted into a feature mask by softmax function, and the edge feature information of the semantic space is corrected; likewise, the same method is adopted to dynamically adjust the geometric edge information for the geometric space; meanwhile, the adjusted edge information is added with the original edge information, and a residual error structure is established to ensure the robustness of information optimization; the calculation of this correction information can be formulated as:
f s =Softmax(mlp(G(p i ,p j )))*S e +S e (1)
p s =Softmax(mlp(S(f i ,f j )))*G e +G e (2)
wherein G is e (pi,pj)=[p i ;p j -p i ],S e (f i ,f j )=[f i ;f j -f i ],p i ,p j ,p s ∈R N×3 ,f i ,f j ,f s ∈R N ×d .
The obtained supplemental information is then combined with p s ,f s And G (p) i ,p j ),S(f i ,f j ) Combining to obtain G' = [ p ] i ;p j -p i ;p s ]Wherein S' = [ f i ;f j -f i ;f s ],G’∈R N×9 ,S’∈R N×3×d Then get them concatTo enhanced local feature information F c The obtained local characteristic information F c Sending into two LBR layers for feature extraction to obtain F c ' finally, the obtained feature F c ' sending the partial region into a maximum pooling layer to complete the feature polymerization of the partial region to obtain a partial polymerization feature F a The method comprises the steps of carrying out a first treatment on the surface of the In obtaining local polymerization characteristics F a Then, the obtained local area aggregation information is sent into the VA layer to carry out global feature extraction;
in the traditional attention mechanism, the attention mechanism is calculated in the point cloud according to the following steps: first, embed feature F a Is fed to three separable convolutions of kernel size 1 x 1 to produce three new feature maps F q 、F k And F v The method comprises the steps of carrying out a first treatment on the surface of the Subsequently, a new feature map F q Will be transposed and multiplied by the new feature map F k Generating an attention matrix of size n×n after the softmax layer:
the method of the vector attention mechanism is introduced into the calculation method of the traditional attention mechanism, the attention matrix is generated by subtraction and two linear layers through a softmax function, and the calculation formula is as follows
Attention(F Q ,F K ,F V )=Softmax(γ(F Q -F K ))F V (4)
Where γ is a mapping function that generates an attention vector for feature aggregation, the attention matrix F to be obtained A And input feature F a Point riding results in a attention feature F v An offset attention mechanism is designed, expressed by the following formula:
F out =Relu(Batchnorm(Mlp(F a -F A )))+F a (5)
the characteristic information F is obtained out
Further, the decoding of the point cloud feature in step S3 specifically includes:
the decoding part adopts a continuous FP layer to carry out up-sampling on the point cloud characteristics so as to achieve the effect of partitioning the point cloud; adopting interpolation based on distance and hierarchical propagation strategy crossing skip links; in the feature propagation level, point features are separated from N l Propagation of x (d+C) point to N l-1 Points, where N l-1 And N l (having N l ≤N l-1 ) Is the input and output point set size of the first SA layer; by at N l-1 Interpolation N at the coordinates of a point l Feature propagation is realized by the feature values f of the points; among the numerous choices of interpolation, the inverse distance weighted average based on k nearest neighbors is used; then N l-1 Interpolation features at the points are connected with skip link point features from the set abstraction level; then, the connected features are passed through the "Pointnet Unit" and the shared full connection and ReLU layer is applied to update the feature vector for each point; repeating the process until the feature is propagated to the original set of points;
and finally, obtaining a segmentation result of the point cloud.
The invention provides a point cloud segmentation method based on bilateral feature fusion and an attention mechanism, which at least comprises the following beneficial effects compared with the prior art:
1) The invention provides a high-efficiency point cloud semantic segmentation network, so that the semantic segmentation of the point cloud is quicker and more accurate, and the segmentation performance is superior.
2) The invention provides a group of correction information calculation modules based on geometric semantic bilateral characteristic information, which adjusts edge information, relieves the problem of edge ambiguity in local area aggregation and strengthens the aggregation effect of local information.
3) The invention provides a new offset vector self-attention module, which effectively extracts the global features of the point cloud and obtains better global feature extraction effect on the basis of reducing the network calculation amount.
Drawings
FIG. 1 is a process schematic diagram of a bilateral feature aggregation module;
FIG. 2 is a process schematic of the attention module;
FIG. 3 is a schematic diagram of a point cloud semantic segmentation network process based on bilateral feature fusion and attention;
FIG. 4 is a point cloud semantic segmentation effect diagram.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.
Example 1.
As shown in fig. 1-4, a point cloud segmentation method based on bilateral feature fusion and vector self-attention, the method comprises the following steps:
s1: inputting original point cloud data
Taking S3DIS as the indoor data set of the test, S3DIS is a large indoor scene segmentation data set and comprises 13 categories and 271 rooms. Each point cloud data has 9 features, namely color information R, G, B, coordinate information x, y, z, and 3 normal vectors. 271 rooms are divided into 6 areas, each room being divided into 1 m x 1 m blocks. Setting an input point cloud position F in Its dimension is [ B, N,9]Wherein B is a batch, N is the number of points, 9 is a feature, and the total number of input features is B x N x 9;
s2: encoding input original point cloud data by using bilateral feature fusion module and vector self-attention module
Dividing the input point cloud into a geometric space comprises coordinate information of the point cloud, and the dimension [ B, N,3 ] of the coordinate information]The semantic space contains color information and normal vectors of the point cloud, and the dimensions of the color information and the normal vectors are [ B, N,6]The semantic space portion is then subjected to mlp operations, converted into semantic space and fed into the encoder SA, respectively, by first generating sampling points p using the FPS on the raw data i The corresponding characteristic information represents f i Grouping the point clouds by a ball query method under the measurement of three-dimensional Euclidean distance by taking the sampling points as the centers, wherein the ball query method is to set a radius r to find the points of a spherical range taking the FPS sampling points as the centers and taking r as the radius, and the points are called as a center point p i Neighbor point p of (2) jIn the model, continuous 4-layer FPS sampling is adopted to construct point cloud space sampling points with different scales as N/4, N/16, N/64 and N/128, 0.1,0.2,0.4,0.8 is adopted as grouping radius point cloud for grouping in each layer, and SA layer and VA layer are added to extract point cloud characteristics. The SA layer is used for representing bilateral feature information fusion and local feature extraction of the point cloud, and the VA layer is used for completing global feature extraction of the point cloud for the improved self-attention layer.
At SA layer, the absolute position of the center point and the relative position of the neighborhood thereof are combined into a geometric spatial local feature G (p i ,p j )=[p i ;p j -p i ]. Likewise, S (f) i ,f j )=[f i ;f j -f i ]Representing local features in semantic space. The geometrical information is converted into semantic space by adopting MLP, the geometrical information is converted into a feature mask by softmax function, and the edge feature information of the semantic space is corrected. Similarly, the same method is used for dynamically adjusting the geometric edge information with respect to the geometric space. And meanwhile, the adjusted edge information is added with the original edge information, so that a residual error structure is established to ensure the robustness of information optimization. The calculation of this correction information can be formulated as:
f s =Softmax(mlp(G(p i ,p j )))*S e +S e (1)
p s =Softmax(mlp(S(f i ,f j )))*G e +G e (2)
wherein G is e (p i ,p j )=[p i ;p j -p i ],S e (f i ,f j )=[f i ;f j -f i ],p i ,p j ,p s ∈R N×3 ,f i ,f j ,f s ∈R N ×d .
The obtained supplemental information is then combined with p s ,f s And G (p) i ,p j ),S(f i ,f j ) Combining to obtain G' = [ p ] i ;p j -p i ;p s ]Wherein S' = [ f i ;f j -f i ;f s ],G’∈R N×9 ,S’∈R N×3×d Then the local feature information Fc is obtained after concat, and the obtained local feature information Fc is sent into two LBR layers (linear+Batchnorm+Relu) for feature extraction to obtain F c ' finally, the obtained feature F c ' sending the partial region into a maximum pooling layer to complete the feature polymerization of the partial region to obtain a partial polymerization feature F a
In obtaining local polymerization characteristics F a And then, sending the obtained local area aggregation information into the VA layer to perform global feature extraction.
Before introducing the improved attention mechanism, the following conventional attention mechanism is first introduced, and the attention mechanism is calculated in the point cloud, usually in accordance with the following steps. First, embed feature F a Is fed to three separable convolutions of kernel size 1 x 1 to produce three new feature maps F q 、F k And F v . Subsequently, a new feature map F q Will be transposed and multiplied by the new feature map F k An attention matrix of size n x n is generated after the softmax layer.
The attention module is a method for introducing a vector attention mechanism in a traditional attention mechanism calculation method, and generates an attention matrix by subtraction and two linear layers through a softmax function. Unlike conventional attention computation, the attention weights in the vector attention are vectors that can modulate a single characteristic channel, and the amount of computation required to calculate the attention matrix can be reduced by subtraction instead. The calculation formula is as follows
Attention(F Q ,F K ,F V )=Softmax(γ(F Q -F K ))F V (4)
Where γ is a mapping function (e.g. MLP) that generates an attention matrix F that the attention vector for feature aggregation will get A And input feature F a Point riding results in a attention feature F v Typically, attention features are then fed into the MLP layer and input features F are added by residual links a The final output feature F is available but in order to increase the attention weight and reduce the effect of noise, an offset attention mechanism is designed which works on the principle of replacing the attention feature with an offset between the input of the self-attention module and the attention feature. Can be expressed by the following formula:
F out =Relu(Batchnorm(Mlp(F a -F A )))+F a (5)
the characteristic information F is obtained out
S3: decoding point cloud features
The decoding part adopts a continuous FP layer to carry out up-sampling on the point cloud characteristics so as to achieve the effect of partitioning the point cloud. Briefly, the FP layer is a process that aggregates features back to the original point cloud, another approach is to propagate features from sub-sampling points to the original points.
Distance-based interpolation and hierarchical propagation strategies across hop-level links are employed. In the feature propagation level, point features are separated from N l Propagation of x (d+C) point to N l-1 Points, where N l-1 And N l (having N l ≤N l-1 ) Is the point set size of the input and output of the layer i SA layer. By at N l-1 Interpolation N at the coordinates of a point l Characteristic value f of each pointNow the feature propagates. Among the numerous options for interpolation, an inverse distance weighted average based on k nearest neighbors is used (as in equation 2, p=2, k=3 is used by default). Then N l-1 Interpolation features at the points are connected with skip-link point features from the set abstraction level. The characteristics of the connection are then passed through a "Pointnet Unit", similar to a convolution by convolution in CNN. Some shared full connection and ReLU layers are applied to update the feature vector for each point. This process is repeated until the feature is propagated to the original set of points.
In the present model, up-sampling is performed on the point cloud by using FP layers of 4 continuous layers, and finally, the segmentation result of the point cloud is obtained, as shown in table 1.
TABLE 1 qualitative segmentation results
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims (4)

1. A point cloud segmentation method based on bilateral feature fusion and vector self-attention is characterized by comprising the following steps:
s1: inputting original point cloud data;
s2: encoding the input original point cloud data by using a bilateral feature fusion module and a vector self-attention module;
s3: and decoding the point cloud characteristics, and up-sampling the point cloud characteristics by using a continuous FP layer to obtain a point cloud segmentation result.
2. The point cloud segmentation method based on bilateral feature fusion and vector self-attention as claimed in claim 1, wherein the input original point cloud data in step S1 specifically comprises:
taking S3DIS as a tested indoor data set, wherein S3DIS is a large indoor scene segmentation data set and comprises 13 categories and 271 rooms; each point cloud data has 9 features, namely color information R, G, B, coordinate information x, y, z and 3 normal vectors; 271 rooms are divided into 6 areas, each room being divided into 1 m x 1 m blocks; setting an input point cloud position F in Its dimension is [ B, N,9]Wherein B is a batch, N is the number of points, 9 is a feature, and the total number of input features is B.times.N.times.9.
3. The method for point cloud segmentation based on bilateral feature fusion and vector self-attention according to claim 2, wherein the encoding of the input original point cloud data using the bilateral feature fusion module and the vector self-attention module in step S2 specifically comprises:
dividing the input point cloud into a geometric space comprises coordinate information of the point cloud, and the dimension [ B, N,3 ] of the coordinate information]The semantic space contains color information and normal vectors of the point cloud, and the dimensions of the color information and the normal vectors are [ B, N,6]The semantic space portion is subjected to mlp operation, converted into semantic space and then respectively fed into an encoder SA, and sampling points p are generated by using FPS on the original data i The corresponding characteristic information represents f i Grouping point clouds by a ball query method under the measurement of three-dimensional Euclidean distance by taking the sampling points as the centers, wherein the ball query method is to find a spherical surface taking an FPS sampling point as the center and taking r as a radius by setting a radius rPoints of the range, which are referred to as center points p i Neighbor point p of (2) jAdding an SA layer and a VA layer to extract point cloud characteristics; the SA layer is used for representing bilateral feature information fusion and local feature extraction of the point cloud, and the VA layer is used for completing global feature extraction of the point cloud for the improved self-attention layer;
at SA layer, the absolute position of the center point and the relative position of the neighborhood thereof are combined into a geometric spatial local feature G (p i ,p j )=[p i ;p j -p i ]The method comprises the steps of carrying out a first treatment on the surface of the Likewise, S (f) i ,f j )=[f i ;f j -f i ]Representing local features in semantic space; the geometrical information is converted into semantic space by adopting MLP, the geometrical information is converted into a feature mask by softmax function, and the edge feature information of the semantic space is corrected; likewise, the same method is adopted to dynamically adjust the geometric edge information for the geometric space; meanwhile, the adjusted edge information is added with the original edge information, and a residual error structure is established to ensure the robustness of information optimization; the calculation of this correction information can be formulated as:
f s =Softmax(mlp(G(p i ,p j )))*S e +S e (1)
p s =Softmax(mlp(S(f i ,f j )))*G e +G e (2)
wherein G is e (p i ,p j )=[p i ;p j -p i ],S e (f i ,f j )=[f i ;f j -f i ],p i ,p j ,p s ∈R N × 3 ,f i ,f j ,f s ∈R N×d .
The obtained supplemental information is then combined with p s ,f s And G (p) i ,p j ),S(f i ,f j ) Combining to obtain G' = [ p ] i ;p j -p i ;p s ]Wherein S' = [ f i ;f j -f i ;f s ],G’∈R N×9 ,S’∈R N×3×d Then they are concat to obtain enhanced local feature information F c The obtained local characteristic information F c Sending into two LBR layers for feature extraction to obtain F c ' finally, the obtained feature F c ' sending the partial region into a maximum pooling layer to complete the feature polymerization of the partial region to obtain a partial polymerization feature F a The method comprises the steps of carrying out a first treatment on the surface of the In obtaining local polymerization characteristics F a Then, the obtained local area aggregation information is sent into the VA layer to carry out global feature extraction;
in the traditional attention mechanism, the attention mechanism is calculated in the point cloud according to the following steps: first, embed feature F a Is fed to three separable convolutions of kernel size 1 x 1 to produce three new feature maps F q 、F k And F v The method comprises the steps of carrying out a first treatment on the surface of the Subsequently, a new feature map F q Will be transposed and multiplied by the new feature map F k Generating an attention matrix of size n×n after the softmax layer:
the method of the vector attention mechanism is introduced into the calculation method of the traditional attention mechanism, the attention matrix is generated by subtraction and two linear layers through a softmax function, and the calculation formula is as follows
Attention(F Q ,F K ,F V )=Softmax(γ(F Q -F K ))F V (4)
Where γ is a mapping function that generates an attention vector for feature aggregation, the attention matrix F to be obtained A And input feature F a Point riding results in a attention feature F v An offset attention mechanism is designed, expressed by the following formula:
F out =Relu(Batchnorm(Mlp(F a -F A )))+F a (5)
the characteristic information F is obtained out
4. The point cloud segmentation method based on bilateral feature fusion and vector self-attention as claimed in claim 3, wherein the decoding of the point cloud features in step S3 specifically comprises:
the decoding part adopts a continuous FP layer to carry out up-sampling on the point cloud characteristics so as to achieve the effect of partitioning the point cloud; adopting interpolation based on distance and hierarchical propagation strategy crossing skip links; in the feature propagation level, point features are separated from N l Propagation of x (d+C) point to N l-1 Points, where N l-1 And N l Is the point set size of the input and output of the layer I SA layer, where N l ≤N l-1 The method comprises the steps of carrying out a first treatment on the surface of the By at N l-1 Interpolation N at the coordinates of a point l Feature propagation is realized by the feature values f of the points; among the numerous choices of interpolation, the inverse distance weighted average based on k nearest neighbors is used; then N l-1 Interpolation features on the points are connected with skip link point features from the set abstraction level, and the shared full connection and ReLU layer is applied to update feature vectors of each point; repeating the process until the feature is propagated to the original set of points;
and finally, obtaining a segmentation result of the point cloud.
CN202310780811.7A 2023-06-29 2023-06-29 Point cloud segmentation method based on bilateral feature fusion and vector self-attention Pending CN116824143A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310780811.7A CN116824143A (en) 2023-06-29 2023-06-29 Point cloud segmentation method based on bilateral feature fusion and vector self-attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310780811.7A CN116824143A (en) 2023-06-29 2023-06-29 Point cloud segmentation method based on bilateral feature fusion and vector self-attention

Publications (1)

Publication Number Publication Date
CN116824143A true CN116824143A (en) 2023-09-29

Family

ID=88116367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310780811.7A Pending CN116824143A (en) 2023-06-29 2023-06-29 Point cloud segmentation method based on bilateral feature fusion and vector self-attention

Country Status (1)

Country Link
CN (1) CN116824143A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117278150A (en) * 2023-11-23 2023-12-22 成都工业学院 Indoor wireless network signal measurement and calculation method, equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117278150A (en) * 2023-11-23 2023-12-22 成都工业学院 Indoor wireless network signal measurement and calculation method, equipment and medium
CN117278150B (en) * 2023-11-23 2024-02-09 成都工业学院 Indoor wireless network signal measurement and calculation method, equipment and medium

Similar Documents

Publication Publication Date Title
CN109377530B (en) Binocular depth estimation method based on depth neural network
WO2018000752A1 (en) Monocular image depth estimation method based on multi-scale cnn and continuous crf
Gao et al. LFT-Net: Local feature transformer network for point clouds analysis
Wang et al. Channel and space attention neural network for image denoising
CN111709290B (en) Crowd counting method based on coding and decoding-jump connection scale pyramid network
CN111259945A (en) Binocular parallax estimation method introducing attention map
CN116824143A (en) Point cloud segmentation method based on bilateral feature fusion and vector self-attention
CN113592927A (en) Cross-domain image geometric registration method guided by structural information
CN114548265A (en) Crop leaf disease image generation model training method, crop leaf disease identification method, electronic device and storage medium
CN114638842B (en) Medical image segmentation method based on MLP
CN115311502A (en) Remote sensing image small sample scene classification method based on multi-scale double-flow architecture
CN116563682A (en) Attention scheme and strip convolution semantic line detection method based on depth Hough network
CN112967296A (en) Point cloud dynamic region graph convolution method, classification method and segmentation method
CN116844004A (en) Point cloud automatic semantic modeling method for digital twin scene
CN116129118B (en) Urban scene laser LiDAR point cloud semantic segmentation method based on graph convolution
CN116485892A (en) Six-degree-of-freedom pose estimation method for weak texture object
CN116091823A (en) Single-feature anchor-frame-free target detection method based on fast grouping residual error module
CN113808006B (en) Method and device for reconstructing three-dimensional grid model based on two-dimensional image
CN113822175B (en) Virtual fitting image generation method based on key point clustering driving matching
CN112990336B (en) Deep three-dimensional point cloud classification network construction method based on competitive attention fusion
CN115115860A (en) Image feature point detection matching network based on deep learning
CN117333682A (en) Multi-view three-dimensional reconstruction method based on self-attention mechanism
CN114022362A (en) Image super-resolution method based on pyramid attention mechanism and symmetric network
CN113239771A (en) Attitude estimation method, system and application thereof
CN111833390A (en) Light field depth estimation method based on unsupervised depth learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination