CN115861619A - Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network - Google Patents

Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network Download PDF

Info

Publication number
CN115861619A
CN115861619A CN202211639217.8A CN202211639217A CN115861619A CN 115861619 A CN115861619 A CN 115861619A CN 202211639217 A CN202211639217 A CN 202211639217A CN 115861619 A CN115861619 A CN 115861619A
Authority
CN
China
Prior art keywords
point cloud
attention
kernel
point
semantic segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211639217.8A
Other languages
Chinese (zh)
Inventor
罗甫林
曾涛
舒文强
郭坦
马泽忠
罗鼎
李朋龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202211639217.8A priority Critical patent/CN115861619A/en
Publication of CN115861619A publication Critical patent/CN115861619A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to an airborne LiDAR urban point cloud semantic segmentation method and system of a recursive residual double-attention kernel point convolution network, belonging to the technical field of computer vision and comprising the following steps: s1: acquiring a point cloud of a target area; s2: preprocessing the acquired point cloud of the target area to acquire training sample data and test sample data; s3: inputting the training sample with the mark into a recursive residual double-attention kernel convolution network for training; s4: and after the training is finished, performing semantic segmentation on the test sample and obtaining a result. The method and the system have better performance than other airborne LiDAR urban point cloud semantic segmentation methods, can better obtain and analyze airborne LiDAR urban point cloud, and have advantages over other methods in the aspect of segmenting category unbalanced point cloud.

Description

Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network
Technical Field
The invention belongs to the technical field of computer vision, and relates to an airborne LiDAR urban point cloud semantic segmentation method and system of a recursive residual double-attention kernel point convolution network.
Background
Laser radar (Light Detection And Ranging, liDAR for short) is a new three-dimensional remote sensing observation technology, can provide point cloud data reflecting three-dimensional structure information of ground objects, is not influenced by illumination shadows, and has advantages in the aspect of ground object identification. Airborne LiDAR point cloud data covering a large-range area can be obtained by carrying laser radar equipment on a manned machine or an unmanned machine for aviation scanning operation. In addition, airborne LiDAR urban point cloud contains a large amount of abundant human and nature-related high-value target information, and semantic segmentation of the point cloud is the basis and key for a series of important applications such as urban three-dimensional modeling and high-precision map drawing. However, the data volume of the LiDAR data is large, the urban area is complex, and the geometric attributes of the ground features are variable, so that the existing algorithm is difficult to realize the fine extraction of the semantic segmentation of the ground features.
Most of traditional point cloud processing methods obtain features by performing a specific artificial definition mode on point cloud data, and complete semantic segmentation tasks of scattered point cloud data by training a group of feature classifiers in combination with classifiers such as a random forest, a support vector machine and a Gaussian mixture model. Obviously, the performance of such methods relies heavily on expert experience and classification algorithms. Although many scholars at home and abroad propose methods for generating three-dimensional global descriptors based on various application scenarios, such as spin images, fast Point Feature Histograms (FPFH), thermonuclear signatures (HKS), histogram localization Signatures (SHOT), and the like. However, these methods estimate the local features of each point independently and predict each label without considering the consistency between neighboring points. Therefore, the segmentation result is often affected by noise and tag inconsistency, and cannot be fully applied to all semantic segmentation scenes.
At present, due to the rapid development of the deep learning field, the deep learning method is applied to the three-dimensional point cloud semantic segmentation, and the obtained effect is often superior to that of the traditional point cloud segmentation method. According to the type of the convolution operator, the prior art method is summarized into two types, namely three-dimensional point cloud semantic segmentation based on the discrete convolution operator and three-dimensional point cloud semantic segmentation based on the continuous convolution operator. In the method based on discrete convolution operation, some students turn to projecting an original three-dimensional point cloud from multiple angles to a two-dimensional plane, and convert the original three-dimensional point cloud into an image segmentation task to process, which is typically represented by SnapNet. However, the spatial information is lost to a certain extent by a projection dimension reduction method, and a very poor result can be obtained easily particularly under complex spatial distribution; or extending the pixel concept to three-dimensional space, organizing the topological relation between space points by voxelization, and then inputting the topological relation into a deep learning model, such as SEGCloud. However, this approach comes at the cost of consuming too much memory and has difficulty capturing high resolution and fine grained features. To avoid the above tedious processes of multi-directional projection and voxel construction, pointNet has created a deep learning architecture directly applied to points, which uses a shared multi-layer perceptron and transformation network to independently learn the features of each point and extract a global representation with a simple aggregation operation; pointNet + + takes into account the local structure of the point cloud, processes a group of points in a hierarchical manner, and then performs aggregation operation on the obtained local features to generate features of higher hierarchy. However, the above method based on discrete convolution does not have a significant effect on airborne LiDAR urban point clouds.
In the method for semantic segmentation of three-dimensional point clouds based on continuous convolution operators, convolution operation in a continuous space is defined, wherein the weight of a neighbor is related to the spatial distribution of each central point. Researchers propose a kernel point continuous convolution network KPConv, however, the network only comprises one kernel point convolution (KPConv) for each downsampling layer, and cannot realize high-robustness multi-level feature extraction for point clouds with different densities; a local and global encoder network (LGENet) is further proposed for semantic segmentation of airborne LiDAR city point cloud data. The method first extracts features through two-dimensional and three-dimensional kernel point continuous convolution to learn representative geometric information. However, this approach does not take into account the global context information of the unordered point cloud. In addition, the above-described continuous kernel point convolution method fails to consider the imbalance and sparsity of airborne LiDAR point cloud data from both local and global perspectives, and how to perform multi-level feature learning, and thus fails to fully identify fine-grained point cloud semantic features.
In summary, in the prior art, local and global multi-level feature learning is not fully considered in the onboard LiDAR urban point cloud semantic segmentation, and the imbalance of semantic categories is ignored. Therefore, the airborne LiDAR urban point cloud semantic segmentation deep learning method capable of representing global and local features in a multi-layer mode is provided, and the effect of improving semantic segmentation precision is a significant research.
Disclosure of Invention
In view of the above, the present invention provides an airborne LiDAR urban point cloud semantic segmentation method and system of a recursive residual double-attention-kernel convolutional network, and the method and system firstly provide an attention-kernel-convolutional (AKPConv) module, weight channel information by using a batch normalization scale factor, the factor represents the importance of channel weight by using standard deviation, and enhance the local feature representation of point cloud; based on an AKPConv module, a recursive residual error kernel point attention module (RRKA) is provided, and diversified characteristics of neighborhood points are aggregated in an iterative cumulative learning mode; establishing a five-layer coding module by an RRKA module, and then establishing a corresponding five-layer decoding module according to the coding module; finally, in order to fully fuse the cross-layer characteristics of the coding layer and the decoding layer, a global-local channel attention interaction module (GLCA) is provided to fuse the global information and the local information so as to improve the discrimination of point cloud fine segmentation. The present method and system may better acquire and analyze airborne LiDAR city point clouds and may have advantages over other methods in segmenting class imbalance point clouds.
In order to achieve the purpose, the invention provides the following technical scheme:
an airborne LiDAR city point cloud semantic segmentation method of a recursive residual double-attention kernel point convolution network, comprising the following steps: s1: acquiring a point cloud of a target area; s2: preprocessing the acquired point cloud of the target area to acquire training sample data and test sample data; s3: inputting part of the marked samples into a recursive residual error double-attention kernel point convolution network for training; s4: and after the training is finished, performing semantic segmentation on the test sample and obtaining a result.
Further, in step S1, adopt based on carrying on LiDAR' S unmanned aerial vehicle or the collection that the people machine platform realized city point cloud data, specifically include: s11, selecting a target area to be segmented according to task requirements of urban point cloud semantic segmentation, setting flight parameters of an airborne platform, including but not limited to flight height and speed, planning a flight route, and adopting a Z-shaped flight route; s12, setting scanning parameters of the LiDAR, considering the coincidence rate of point clouds, ensuring the scanning accuracy of the point clouds, selecting a laser radar based on non-repetitive scanning, wherein the FOV is 70.4 degrees multiplied by 77.2 degrees, the distance measurement accuracy is 3 cm-1000 cm, and the maximum number of multi-echoes is 480000 points/second; s13, during specific collection, the airborne platform is used for carrying the LiDAR, flying is carried out according to a set scanning air route, and point cloud data of a target area are obtained.
Further, in step S2, preprocessing the acquired point cloud of the target area, mainly including point cloud registration, noise removal, and radiation correction, and then extracting training samples and test samples in blocks from the point cloud data, and labeling semantic labels of the training samples; in particular, in order to reduce the influence of the long tail distribution of the echo intensity of the point cloud data, the robustness of the network is improved by modifying the echo intensity data of the point cloud into normal distribution by using gamma conversion, and the formula is as follows:
Figure BDA0004007980020000031
/>
wherein I is the acquired echo intensity; i is γ The echo intensity after gamma correction is in the range of 0 to 255; gamma is a parameter, and the value range of gamma is more than or equal to 0 and less than or equal to 1; by this formula, the raw echo intensities can be mapped to image space.
Further, in step S3, inputting the point cloud training data after gamma correction into an attention kernel convolution block (AKPConv), and learning the point cloud features to obtain the attention kernel convolution features; in the down-sampling process, the point number of the point cloud can be down-sampled by using an AKPConv module; specifically, the attention weight is obtained by adopting a scale factor in a sample normalization layer to obtain a channel attention feature f c The following:
Figure BDA0004007980020000032
wherein f is in Input point cloud sample features; mu.s B Is f in Mean value of (a) B Is f in The variance of (a); ε is a small constant, avoiding a denominator of 0, set to 1 × 10 -5 ;f out Normalizing point cloud sample characteristics;
Figure BDA0004007980020000033
is the product of elements; sigmoid () is sigmoid function, i.e., sigmoid (x) = 1/(1 + e) -x ) (ii) a Then, the attention feature is input into a kernel convolution formula to obtain a feature F 1 The formula is as follows:
F 1 =Conv 1×1 (KPConv(f c ))
wherein, conv 1×1 1 × 1 convolution of representation, KPConv (f) c ) The expression is the convolution operation of the kernel points, and the specific formula is as follows:
Figure BDA0004007980020000041
wherein the content of the first and second substances,
Figure BDA0004007980020000042
is a neighborhood set of point x at a fixed radius R (R ∈ R), i.e., N x ={x i ∈x|||x i -x||≤r},x i Is an arbitrary subset belonging to point x, f i Is a point cloud subset x i Corresponding features, kernel function κ (·) is as follows:
Figure BDA0004007980020000043
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0004007980020000044
representing the position of the spherical nucleation point in 3D space, n k Denotes the number of core points, W, of the kernel function κ (·) k Is a weight matrix of the corresponding kernel point; correlation function->
Figure BDA0004007980020000045
Sigma is a hyper-parameter used for controlling the influence of the distance of the nuclear points; in order to keep the input characteristics, jump connection is added in an AKPConv block; for skipped connections, maxpooling () is an optional max pooling operation when D in When =2D, it will be used, the operation can be expressed as:
Figure BDA0004007980020000046
finally, the output characteristic F of AKPConv AKPConv Can be expressed as:
F AKPConv =ReLU(F 1 +F 2 )
where ReLU (x) = max (0, x) denotes an activation function.
Further, in order to accumulate the aggregated local features and generate diversified features, a recursive residual kernel attention module (RRKA) is composed of a recursive point convolution block (RPConv) composed of AKPConv and a single hidden layer multi-layer perception (MLP); the recursive point convolution (RPConv) block is mainly used for learning the accumulated neighborhood characteristics of the point cloud, and the formula is expressed as follows:
Figure BDA0004007980020000047
wherein
Figure BDA0004007980020000048
RPConv input for l levels of T recursions, T =1,2.3,. T; AKPConv () is represented as an attention kernel convolution operation, and @>
Figure BDA0004007980020000049
For the input of RPConv t times, </R>
Figure BDA00040079800200000410
Is output for the t-1 recursion; to meet the computational efficiency of RPConv, before RPConv is used, MLP is first used to compress the dimension of the feature, and then MLP is used to restore the dimension of the feature; therefore, a RPConv and MLP based compression loop block (CRB) is constructed to improve the calculation efficiency, and the formula is as follows:
x cout =ReLU(BN(W 2 *RPConv(BN(W 1 *x cin ))))
wherein, W 1 And W 2 For the learnable parameters of MLP, PRConv () is recursive point convolution operation, BN () is batch normalization operation; on the basis of CRB, a residual error recursive kernel attention (RRKA) module with residual error connection is developed, the module can effectively realize repeated operation on local features, the diversity representation capability of point cloud features is enhanced, and RRKA output can be represented as follows:
x out =CRB(CRB(W 0 *x in ))+x in
wherein W is 0 Learning parameters, x, for the weights of MLP in For the input point cloud feature, CRB () is a CRB operation.
Further, after five layers of the coding modules are processed, decoding operation is carried out, and after low-level and high-level features are spliced, a global-local channel attention module (GLCA) is utilized for feature tensor; firstly, a full-connection mode is adopted to realize the channel information fusion of low-level and high-level characteristics under the global space, and the formula x thereof g The following:
x g =BN(W g *x in )
wherein the module has inputs of
Figure BDA0004007980020000051
N and C are the number of up-sampled point clouds and the dimensions of the features, W g Full connection weights for fusing low-level and high-level features; in local feature computation, attention weights are obtained using average pooling and one-dimensional convolution:
ω=sigmoid(W k *(avgpooling(x g )))
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0004007980020000052
for average pooling operation of the channels, W k Is a learnable local one-dimensional kernel convolution weight, with a magnitude of k =5, and is formulated as follows:
Figure BDA0004007980020000053
from the attention weight ω, the following local attention feature x can be obtained l ("indicates element-by-element multiplication), the formula is as follows:
Figure BDA0004007980020000054
output result x of global-local cross-layer information interaction module GLCA out Expressed as:
x out =x g +x l
further, the feature tensor passes through two full-connection layers, semantic segmentation results are obtained through a Sigmoid activation function, a focus loss function is introduced for solving the problem of data imbalance, and the loss function is set as follows:
Figure BDA0004007980020000055
wherein λ is typically set to 2, α t Representing class weight parameters, N representing the number of point clouds, p jc Indicates that the jth sample is includedc probability of class; optimizing model parameters of the semantic segmentation framework by using a stochastic gradient descent method according to the focus loss function, and obtaining a trained semantic segmentation framework after training is completed; and judging the input test sample through the trained semantic segmentation frame, and outputting a semantic segmentation result.
The invention has the beneficial effects that:
the attention kernel point convolution representation module can effectively learn local neighborhood characteristics of the point cloud, weight channel information by using batch normalization scale factors, and represent the importance of channel weight by using standard deviation. The invention provides a recursive residual error kernel attention module which takes an attention kernel convolution module as a key point to mine multi-level point cloud local information and generate low-level semantic features with discriminative power. The global-local channel attention module provided by the invention fuses the up-sampled high-level features and the low-level features connected in a jumper connection, realizes global and local information interaction, and improves the fine point cloud segmentation effect. The invention provides an airborne LiDAR urban point cloud semantic segmentation method RRDAN of a recursive residual error double-attention kernel point convolution network, which is concentrated in multi-level feature representation learning and has strong representation capability on the airborne LiDAR urban point cloud with unbalanced category. Experimental results on two airborne LiDAR urban point cloud data sets show that the performance of the RRDAN is superior to the most advanced airborne LiDAR urban point cloud semantic segmentation method at present.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of an airborne platform collecting point cloud;
FIG. 3 is a diagram of a recursive residual double attention kernel convolutional network (RRDAN) for on-board LiDAR city point cloud semantic segmentation;
FIG. 4 is a block diagram of the attention kernel convolution module AKPConv of the present invention;
FIG. 5 is a block diagram of the recursive residual error kernel attention module RRKA of the present invention;
FIG. 6 is a block diagram of a global-local channel attention interaction module GLCA in accordance with the present invention;
FIG. 7 is a graph of the results of an RRDAN network of the present invention;
fig. 8 is an error chart of the experimental result of the RRDAN network of the present invention.
Detailed Description
The technical scheme of the invention is explained in detail in the following with the accompanying drawings.
FIG. 1 is a flow chart of a method of the invention, and the invention provides an airborne LiDAR urban point cloud semantic segmentation method and system of a recursive residual error double-attention kernel point convolution network. The deep learning network for semantic segmentation is shown in fig. 3, and can learn fine and representative feature from airborne point cloud. The network consists of an attention kernel convolution module (AKPConv), a recursive residual kernel attention module (RRKA) and a global-local channel attention interaction module (GLCA). Firstly, local features of point cloud data after gamma correction of reflection intensity are obtained through a three-layer attention kernel point convolution module, and then semantic features with multi-level fine representation capability are obtained and cumulatively learned from the local features through an RRKA module. Then, after the stitching of the low-level and high-level features, a global-local channel attention module (GLCA) is applied to the feature tensor to learn the fused semantic information. The attention kernel point convolution module (AKPConv) designed by the invention is used for extracting local features of a point cloud neighborhood, wherein an attention mechanism is embedded for weighting channel features, and a jump connection is added for fusing context information. The recursive residual error kernel point attention module (RRKA) provided by the invention can aggregate diversified characteristics of the neighborhood points in an iterative cumulative learning mode. By a dual attention mechanism of the network, subtle feature representations can be enhanced to improve segmentation performance. The invention provides a multilevel double-attention-core point convolution network which takes key information recursive accumulation learning as a key point to mine the intrinsic information of the airborne point cloud with unbalanced category distribution and generate high-level semantic features with discriminative power.
Specifically, the technical scheme of the invention comprises the following contents:
1. data acquisition: the urban point cloud data acquisition is realized based on an unmanned aerial vehicle or a manned platform carrying LiDAR. Firstly, according to the requirement of a semantic segmentation task of urban point cloud, selecting a target area to be segmented, setting flight parameters of an airborne platform, including flight height, speed and the like, planning a flight route, and generally adopting a Z-shaped flight route. Then, the scanning parameters of the LiDAR are set, the resolution of the point cloud density is considered, the scanning accuracy is guaranteed, a laser radar based on non-repeated scanning is selected, the FOV is 70.4 degrees multiplied by 77.2 degrees, the distance measurement accuracy is 3 cm-1000 cm, and the maximum multi-echo is 480000 points/second. During specific collection, an airborne platform is used for carrying LiDAR, flying is carried out according to a set scanning route, and point cloud data of a target area are obtained, wherein the schematic diagram is shown in figure 2.
2. Data preprocessing: and preprocessing the acquired point cloud of the target area, mainly comprising point cloud registration, noise removal and radiation correction, then extracting training samples and test samples in blocks from point cloud data, and marking semantic labels of the training samples. In particular, in order to reduce the influence of the long tail distribution of the echo intensity of the point cloud data, the robustness of the network is improved by modifying the echo intensity data of the point cloud into normal distribution by using gamma conversion, and the formula is as follows:
Figure BDA0004007980020000071
wherein I is the acquired echo intensity; i is γ Is gammaCorrected echo intensity, ranging from 0 to 255; gamma is a parameter, and the value range of gamma is more than or equal to 0 and less than or equal to 1; by this formula, the raw echo intensities can be mapped to image space.
3. The gamma corrected point cloud training data is input to an attention kernel volume block (AKPConv), as shown in fig. 4. Learning point cloud characteristics to obtain attention kernel point convolution characteristics; in the down-sampling, the point number of the point cloud can be down-sampled by using an AKPConv module. Specifically, the attention weight is obtained by using the scale factor in the sample normalization layer to obtain the channel attention feature f c The following are:
Figure BDA0004007980020000081
/>
wherein f is in Input point cloud sample features; mu.s B Is f in Mean value of (a) B Is f in The variance of (a); ε is a small constant, avoiding a denominator of 0, set to 1 × 10 -5 ;f out Normalizing point cloud sample characteristics;
Figure BDA0004007980020000082
is the product of the elements; sigmoid () is a sigmoid function, i.e., sigmoid (x) = 1/(1 + e) -x )。
Then, the attention feature is input into a kernel convolution formula to obtain a feature F 1 The following are:
F 1 =Conv 1×1 (KPConv(f c ))
wherein, conv 1×1 Expressed by 1 × 1 convolution, KPConv () expressed by the kernel convolution operation, the specific formula is as follows:
Figure BDA0004007980020000083
wherein the content of the first and second substances,
Figure BDA0004007980020000084
is that point x is at a fixed radius r (r ∈ rR) neighborhood set, N x ={x i ∈x|||x i -x||≤r},x i Is an arbitrary subset belonging to point x, f i Is a point cloud subset x i Corresponding to the feature, the formula of the kernel function κ (·) is as follows:
Figure BDA0004007980020000085
wherein the content of the first and second substances,
Figure BDA0004007980020000086
representing the position of the spherical nucleation point in 3D space, n k Denotes the number of core points of the kernel function κ (·), W k Is a weight matrix of the corresponding kernel point; correlation function>
Figure BDA0004007980020000087
Sigma is a hyper-parameter used to control the influence of the distance of the epipolar points.
In order to keep the input characteristics, jump connection is added in an AKPConv block; for skipped connections, maxpooling () is an optional max pooling operation. When D is present in When =2D, it will be used, the operation can be expressed as:
Figure BDA0004007980020000088
finally, the output characteristic F of AKPConv AKPConv Can be expressed as:
F AKPConv =ReLU(F 1 +F 2 )
where ReLU (x) = max (0, x) denotes an activation function.
4. In order to accumulate aggregate local features and generate diversified features, a recursive residual kernel attention module (RRKA) is composed of a recursive point convolution block (RPConv) composed of AKPConv and multi-layer perception (MLP) of a single hidden layer, as shown in fig. 5. The recursive point convolution (RPConv) block is mainly used for learning the accumulated neighborhood characteristics of the point cloud, and the formula is expressed as follows:
Figure BDA0004007980020000091
wherein
Figure BDA0004007980020000092
RPConv input for l levels of T recursions, T =1,2,3, ·, T; AKPConv () is represented as an attention kernel convolution operation, and @>
Figure BDA0004007980020000093
For the input of RPConv t times, </R>
Figure BDA0004007980020000094
Is output for the t-1 th recursion.
To meet RPConv computational efficiency, MLP is first employed to compress and then MLP is employed to restore the dimensions of the features before RPConv is used. Therefore, a RPConv and MLP based compression loop block (CRB) is constructed to improve the calculation efficiency, and the formula is as follows:
x cout =ReLU(BN(W 2 *RPConv(BN(W 1 *x cin ))))
wherein, W 1 And W 2 For the learnable parameters of MLP, PRConv () is a recursive point convolution operation and BN () is a batch normalization operation.
On the basis of CRB, a residual error recursive kernel attention (RRKA) module with residual error connection is developed, the module can effectively realize repeated operation on local features, the diversity representation capability of point cloud features is enhanced, and RRKA output can be represented as follows:
x out =CRB(CRB(W 0 *x in ))+x in
wherein W is 0 Learning parameters, x, for the weights of MLP in For the input point cloud feature, CRB () is a CRB operation.
5. After five layers of the above-mentioned coding modules, a decoding operation is performed, and after the concatenation of the low-level and high-level features, a global-local channel attention module (GLCA) is used for the feature tensor, as shown in fig. 6.
The method adopts a full-connection mode to realize the channel information fusion of low-level and high-level characteristics under the global space, and the formula x thereof g The following were used:
x g =BN(W g *x in )
wherein the module has inputs of
Figure BDA0004007980020000095
N and C are the number of up-sampled point clouds and the dimensions of the features, W g To fuse the full connection weights of the lower and upper level features.
In local feature computation, attention weights are obtained using average pooling and one-dimensional convolution:
ω=sigmoid(W k *(avgpooling(x g )))
wherein the content of the first and second substances,
Figure BDA0004007980020000096
for average pooling operation of the channels, W k Is a learnable local one-dimensional kernel convolution weight with a magnitude of k =5, whose formula is as follows:
Figure BDA0004007980020000097
from the attention weight ω, the following local attention feature x can be obtained l ("indicates element-by-element multiplication), the formula is as follows:
x l =BN(ω⊙x g )
output result x of global-local cross-layer information interaction module GLCA out Expressed as:
x out =x g +x l
6. the feature tensor passes through two full-connection layers, semantic segmentation results are obtained through a Sigmoid activation function, a focus loss function is introduced for solving the problem of data imbalance, and the loss function is set as follows:
Figure BDA0004007980020000101
wherein λ is generally set to 2, α t Representing class weight parameters, N representing the number of point clouds, p jc Representing the probability that the jth sample falls into the c category. Optimizing the model parameters of the semantic segmentation framework by using a random gradient descent method according to the focus loss function, and obtaining the trained semantic segmentation framework after training is finished; and judging the input test sample through the trained semantic segmentation frame, and outputting a semantic segmentation result.
Fig. 7 shows the experimental result of the RRDAN semantic segmentation network according to the present invention on an open source airborne city point cloud data set ISPRS, and fig. 8 shows an error graph, which shows that 9 classes in the test area are well segmented. The segmentation effect of the present invention can be further illustrated by comparative experiments. Comparing an ISPRS data set by using the method disclosed by the invention with other existing methods such as LUH, RIT _1, alsNet, KPConv, DPE, GANET, DANCE-NET, D-FCN, randlanet, graNet and LGENet, and respectively calculating Overall Accuracy (OA) and average F1 index Avg.F1 as shown in Table 1, wherein the larger the Overall Accuracy OA is, the higher the correct proportion of all results predicted to be correct is; the larger the average F1 index avg. F1, the better the overall evaluation of the results. And each index value of the detection results of different methods is shown in a table l.
Comparison of Table l RRADN with various methods on ISPRS data sets
Figure BDA0004007980020000102
It can be seen that the method of the present invention achieves the best OA and avg.f1 on this data set. Meanwhile, the performance of the method is superior to that of other airborne LiDAR urban point cloud semantic segmentation methods. The method provided by the invention can obtain better effect, and has advantages in the aspects of automobiles, roofs, facades, short shrubs and trees compared with other methods.
Finally, the above embodiments are only used for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all that should be covered by the claims of the present invention.

Claims (8)

1. An airborne LiDAR city point cloud semantic segmentation method of a recursive residual double-attention kernel point convolution network is characterized by comprising the following steps: the method comprises the following steps:
s1: acquiring a point cloud of a target area;
s2: preprocessing the acquired point cloud of the target area to obtain training sample data and test sample data;
s3: inputting part of the marked samples into a recursive residual double-attention kernel convolution network for training;
s4: and after the training is finished, performing semantic segmentation on the test sample and obtaining a result.
2. The method of claim 1 for on-board LiDAR city point cloud semantic segmentation of a recursive residual double-attention kernel point convolutional network, comprising: in step S1, the method adopts an unmanned aerial vehicle or a manned platform carrying LiDAR to realize the collection of the urban point cloud data, and specifically comprises the following steps: s11, selecting a target area to be segmented according to task requirements of urban point cloud semantic segmentation, setting flight parameters of an airborne platform, including but not limited to flight height and speed, planning a flight route, and adopting a Z-shaped flight route; s12, setting scanning parameters of the LiDAR, considering the coincidence rate of point clouds, ensuring the scanning accuracy of the point clouds, selecting a laser radar based on non-repetitive scanning, wherein the FOV is 70.4 degrees multiplied by 77.2 degrees, the distance measurement accuracy is 3 cm-1000 cm, and the maximum number of multi-echoes is 480000 points/second; and S13, during specific acquisition, carrying the LiDAR by using the airborne platform, flying according to a set scanning route, and acquiring point cloud data of a target area.
3. The method of claim 2, wherein the method for semantic segmentation of airborne LiDAR urban point clouds in a recursive residual double-attention-kernel convolutional network comprises: in the step S2, preprocessing the acquired point cloud of the target area, mainly comprising point cloud registration, noise removal and radiation correction, then extracting training samples and test samples in blocks from point cloud data, and marking semantic labels of the training samples; in particular, in order to reduce the influence of the long tail distribution of the echo intensity of the point cloud data, the robustness of the network is improved by modifying the echo intensity data of the point cloud into normal distribution by using gamma conversion, and the formula is as follows:
Figure QLYQS_1
wherein I is the acquired echo intensity; i is γ The echo intensity after gamma correction is in the range of 0 to 255; gamma is a parameter, and the value range of gamma is more than or equal to 0 and less than or equal to 1; by this formula, the raw echo intensities can be mapped to image space.
4. The method of claim 3 for on-board LiDAR city point cloud semantic segmentation of a recursive residual double-attention kernel point convolutional network, comprising: in step S3, inputting the point cloud training data after gamma correction into an attention kernel convolution block (AKPConv), and learning the point cloud characteristics to obtain the attention kernel convolution characteristics; in the down-sampling process, the point number of the point cloud can be down-sampled by using an AKPConv module; specifically, the attention weight is obtained by adopting a scale factor in a sample normalization layer to obtain a channel attention feature f c The following are:
Figure QLYQS_2
wherein, f in Input point cloud sample features;
Figure QLYQS_3
is f in In the mean value of (a)>
Figure QLYQS_4
Is f in The variance of (a); ε is a small constant, avoiding a denominator of 0, set to 1 × 10 -5 ;f out Normalizing point cloud sample characteristics; />
Figure QLYQS_5
Is the product of the elements; sigmoid () is sigmoid function, i.e., sigmoid (x) = 1/(1 + e) -x ) (ii) a Then, the attention feature is input into a kernel convolution formula to obtain a feature F 1 The formula is as follows: />
F 1 =Conv 1×1 (KPConv(f c ))
Wherein, conv 1×1 Expressed by 1 × 1 convolution, KPConv () expressed by the kernel convolution operation, the specific formula is as follows:
Figure QLYQS_6
wherein the content of the first and second substances,
Figure QLYQS_7
is a neighborhood set of point x at a fixed radius R (R ∈ R), i.e., N x ={x i ∈x∣||x i -x||≤r},x i Is an arbitrary subset belonging to point x, f i Is a point cloud subset x i Corresponding to the feature, the formula of the kernel function κ (·) is as follows:
Figure QLYQS_8
wherein the content of the first and second substances,
Figure QLYQS_9
representing the position of the spherical nucleation point in 3D space, n k Denotes the number of core points of the kernel function κ (·), W k Is a weight matrix of the corresponding kernel point; correlation function->
Figure QLYQS_10
Sigma is a hyper-parameter used for controlling the influence of the distance of the nuclear points; in order to keep the input characteristics, jump connection is added in an AKPConv block; for skipped connections, maxpooling () is an optional max pooling operation when D in When =2D, it will be used, the operation can be expressed as:
Figure QLYQS_11
finally, the output characteristic F of AKPConv AKPConv Can be expressed as:
F AKPConv =ReLU(F 1 +F 2 )
where ReLU (x) = max (0, x) denotes an activation function.
5. The method of claim 4 for on-board LiDAR city point cloud semantic segmentation of a recursive residual double-attention kernel point convolutional network, comprising: in order to accumulate the aggregate local features and generate diversified features, a recursive residual kernel attention module (RRKA) is formed by a recursive point convolution block (RPConv) formed by AKPConv and multilayer perception (MLP) of a single hidden layer; the recursive point convolution (RPConv) block is mainly used for learning the accumulated neighborhood characteristics of the point cloud, and the formula is expressed as follows:
Figure QLYQS_12
wherein
Figure QLYQS_13
RPConv input for l levels of T recursions, T =1,2,3, \8230;, T; AKPConv () is represented as an attention kernel convolution operation, based on a convolution operation performed on a plurality of pixels, and based on a convolution operation performed on a plurality of pixels>
Figure QLYQS_14
For the input of RPConv t times, </R>
Figure QLYQS_15
Is output for the t-1 recursion; to meet the computational efficiency of RPConv, before RPConv is used, MLP is first used to compress the dimension of the feature, and then MLP is used to restore the dimension of the feature; therefore, a RPConv and MLP based compression loop block (CRB) is constructed to improve the calculation efficiency, and the formula is as follows:
x cout =ReLU(BN(W 2 *RPConv(BN(W 1 *x cin ))))
wherein, W 1 And W 2 For the learnable parameters of MLP, PRConv () is recursive point convolution operation, BN () is batch normalization operation; on the basis of CRB, a residual error recursive kernel attention (RRKA) module with residual error connection is developed, the module can effectively realize repeated operation on local features, the diversity representation capability of point cloud features is enhanced, and RRKA output can be represented as follows:
x out =CRB(CRB(W 0 *x in ))+x in
wherein W is 0 Learning parameters, x, for the weights of the MLP in For the input point cloud feature, CRB () is a CRB operation.
6. The method of claim 5 for on-board LiDAR city point cloud semantic segmentation of a recursive residual double-attention kernel point convolutional network, comprising: decoding operation is carried out after the five layers of coding modules, and after low-level and high-level features are spliced, a global-local channel attention module (GLCA) is utilized for feature tensor; firstly, a full-connection mode is adopted to realize the channel information fusion of low-level and high-level characteristics under the global space, and the formula x thereof g The following were used:
x g =BN(W g *x in )
wherein the module has inputs of
Figure QLYQS_16
N and C are the number of up-sampled point clouds and the dimensions of the features, W g Full connection weights for fusing low-level and high-level features; in local feature computation, attention weights are obtained using average pooling and one-dimensional convolution:
ω=sigmoid(W k *(avgpooling(x g )))
wherein the content of the first and second substances,
Figure QLYQS_17
for average pooling operation of the channels, W k Is a learnable local one-dimensional kernel convolution weight, with a magnitude of k =5, and is formulated as follows:
Figure QLYQS_18
from the attention weight ω, the following local attention feature x can be obtained l (. All indicate element-by-element multiplication), the formula is as follows:
x l =BN(ω⊙x g )
output result x of global-local cross-layer information interaction module GLCA out Expressed as:
x out =x g +x l。
7. the method of claim 6, wherein the method comprises an on-board LiDAR urban point cloud semantic segmentation method using a recursive residual double-attention-kernel convolutional network, wherein the method comprises: the feature tensor passes through two full-connection layers, semantic segmentation results are obtained through a Sigmoid activation function, a focus loss function is introduced for solving the problem of data imbalance, and the loss function is set as follows:
Figure QLYQS_19
where λ is typically set to 2, α t Representing class weight parameters, N representing the number of point clouds, p jc Representing the probability that the jth sample falls into the c category; optimizing the model parameters of the semantic segmentation framework by using a random gradient descent method according to the focus loss function, and obtaining the trained semantic segmentation framework after training is finished; the input test sample is distinguished through a trained semantic segmentation frame,and outputting a semantic segmentation result.
8. An airborne LiDAR urban point cloud semantic segmentation system of a recursive residual double-attention-kernel convolutional network is characterized in that: the system employs the method of any one of claims 1 to 8.
CN202211639217.8A 2022-12-20 2022-12-20 Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network Pending CN115861619A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211639217.8A CN115861619A (en) 2022-12-20 2022-12-20 Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211639217.8A CN115861619A (en) 2022-12-20 2022-12-20 Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network

Publications (1)

Publication Number Publication Date
CN115861619A true CN115861619A (en) 2023-03-28

Family

ID=85674419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211639217.8A Pending CN115861619A (en) 2022-12-20 2022-12-20 Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network

Country Status (1)

Country Link
CN (1) CN115861619A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116413740A (en) * 2023-06-09 2023-07-11 广汽埃安新能源汽车股份有限公司 Laser radar point cloud ground detection method and device
CN116468892A (en) * 2023-04-24 2023-07-21 北京中科睿途科技有限公司 Semantic segmentation method and device of three-dimensional point cloud, electronic equipment and storage medium
CN116958557A (en) * 2023-08-11 2023-10-27 安徽大学 Three-dimensional indoor scene semantic segmentation method based on residual impulse neural network
CN117541799A (en) * 2024-01-09 2024-02-09 四川大学 Large-scale point cloud semantic segmentation method based on online random forest model multiplexing

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116468892A (en) * 2023-04-24 2023-07-21 北京中科睿途科技有限公司 Semantic segmentation method and device of three-dimensional point cloud, electronic equipment and storage medium
CN116413740A (en) * 2023-06-09 2023-07-11 广汽埃安新能源汽车股份有限公司 Laser radar point cloud ground detection method and device
CN116413740B (en) * 2023-06-09 2023-09-05 广汽埃安新能源汽车股份有限公司 Laser radar point cloud ground detection method and device
CN116958557A (en) * 2023-08-11 2023-10-27 安徽大学 Three-dimensional indoor scene semantic segmentation method based on residual impulse neural network
CN117541799A (en) * 2024-01-09 2024-02-09 四川大学 Large-scale point cloud semantic segmentation method based on online random forest model multiplexing
CN117541799B (en) * 2024-01-09 2024-03-08 四川大学 Large-scale point cloud semantic segmentation method based on online random forest model multiplexing

Similar Documents

Publication Publication Date Title
CN107609601B (en) Ship target identification method based on multilayer convolutional neural network
CN108573276B (en) Change detection method based on high-resolution remote sensing image
CN115861619A (en) Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network
CN111899172A (en) Vehicle target detection method oriented to remote sensing application scene
Huang et al. GraNet: Global relation-aware attentional network for semantic segmentation of ALS point clouds
CN110222767B (en) Three-dimensional point cloud classification method based on nested neural network and grid map
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN113420607A (en) Multi-scale target detection and identification method for unmanned aerial vehicle
CN109919223B (en) Target detection method and device based on deep neural network
WO2023030182A1 (en) Image generation method and apparatus
CN113870160B (en) Point cloud data processing method based on transformer neural network
CN113706480A (en) Point cloud 3D target detection method based on key point multi-scale feature fusion
CN114049572A (en) Detection method for identifying small target
Liu et al. Survey of road extraction methods in remote sensing images based on deep learning
CN116824585A (en) Aviation laser point cloud semantic segmentation method and device based on multistage context feature fusion network
CN110287798B (en) Vector network pedestrian detection method based on feature modularization and context fusion
CN116385958A (en) Edge intelligent detection method for power grid inspection and monitoring
CN117079163A (en) Aerial image small target detection method based on improved YOLOX-S
Liao et al. Lr-cnn: Local-aware region cnn for vehicle detection in aerial imagery
CN112668662B (en) Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN114283326A (en) Underwater target re-identification method combining local perception and high-order feature reconstruction
CN114187506A (en) Remote sensing image scene classification method of viewpoint-aware dynamic routing capsule network
CN114550023A (en) Traffic target static information extraction device
Wang et al. Based on the improved YOLOV3 small target detection algorithm
CN116824413A (en) Aerial image target detection method based on multi-scale cavity convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination