CN115861619A

CN115861619A - Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network

Info

Publication number: CN115861619A
Application number: CN202211639217.8A
Authority: CN
Inventors: 罗甫林; 曾涛; 舒文强; 郭坦; 马泽忠; 罗鼎; 李朋龙
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2022-12-20
Filing date: 2022-12-20
Publication date: 2023-03-28

Abstract

The invention relates to an airborne LiDAR urban point cloud semantic segmentation method and system of a recursive residual double-attention kernel point convolution network, belonging to the technical field of computer vision and comprising the following steps: s1: acquiring a point cloud of a target area; s2: preprocessing the acquired point cloud of the target area to acquire training sample data and test sample data; s3: inputting the training sample with the mark into a recursive residual double-attention kernel convolution network for training; s4: and after the training is finished, performing semantic segmentation on the test sample and obtaining a result. The method and the system have better performance than other airborne LiDAR urban point cloud semantic segmentation methods, can better obtain and analyze airborne LiDAR urban point cloud, and have advantages over other methods in the aspect of segmenting category unbalanced point cloud.

Description

Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network

Technical Field

The invention belongs to the technical field of computer vision, and relates to an airborne LiDAR urban point cloud semantic segmentation method and system of a recursive residual double-attention kernel point convolution network.

Background

Laser radar (Light Detection And Ranging, liDAR for short) is a new three-dimensional remote sensing observation technology, can provide point cloud data reflecting three-dimensional structure information of ground objects, is not influenced by illumination shadows, and has advantages in the aspect of ground object identification. Airborne LiDAR point cloud data covering a large-range area can be obtained by carrying laser radar equipment on a manned machine or an unmanned machine for aviation scanning operation. In addition, airborne LiDAR urban point cloud contains a large amount of abundant human and nature-related high-value target information, and semantic segmentation of the point cloud is the basis and key for a series of important applications such as urban three-dimensional modeling and high-precision map drawing. However, the data volume of the LiDAR data is large, the urban area is complex, and the geometric attributes of the ground features are variable, so that the existing algorithm is difficult to realize the fine extraction of the semantic segmentation of the ground features.

Most of traditional point cloud processing methods obtain features by performing a specific artificial definition mode on point cloud data, and complete semantic segmentation tasks of scattered point cloud data by training a group of feature classifiers in combination with classifiers such as a random forest, a support vector machine and a Gaussian mixture model. Obviously, the performance of such methods relies heavily on expert experience and classification algorithms. Although many scholars at home and abroad propose methods for generating three-dimensional global descriptors based on various application scenarios, such as spin images, fast Point Feature Histograms (FPFH), thermonuclear signatures (HKS), histogram localization Signatures (SHOT), and the like. However, these methods estimate the local features of each point independently and predict each label without considering the consistency between neighboring points. Therefore, the segmentation result is often affected by noise and tag inconsistency, and cannot be fully applied to all semantic segmentation scenes.

At present, due to the rapid development of the deep learning field, the deep learning method is applied to the three-dimensional point cloud semantic segmentation, and the obtained effect is often superior to that of the traditional point cloud segmentation method. According to the type of the convolution operator, the prior art method is summarized into two types, namely three-dimensional point cloud semantic segmentation based on the discrete convolution operator and three-dimensional point cloud semantic segmentation based on the continuous convolution operator. In the method based on discrete convolution operation, some students turn to projecting an original three-dimensional point cloud from multiple angles to a two-dimensional plane, and convert the original three-dimensional point cloud into an image segmentation task to process, which is typically represented by SnapNet. However, the spatial information is lost to a certain extent by a projection dimension reduction method, and a very poor result can be obtained easily particularly under complex spatial distribution; or extending the pixel concept to three-dimensional space, organizing the topological relation between space points by voxelization, and then inputting the topological relation into a deep learning model, such as SEGCloud. However, this approach comes at the cost of consuming too much memory and has difficulty capturing high resolution and fine grained features. To avoid the above tedious processes of multi-directional projection and voxel construction, pointNet has created a deep learning architecture directly applied to points, which uses a shared multi-layer perceptron and transformation network to independently learn the features of each point and extract a global representation with a simple aggregation operation; pointNet + + takes into account the local structure of the point cloud, processes a group of points in a hierarchical manner, and then performs aggregation operation on the obtained local features to generate features of higher hierarchy. However, the above method based on discrete convolution does not have a significant effect on airborne LiDAR urban point clouds.

In the method for semantic segmentation of three-dimensional point clouds based on continuous convolution operators, convolution operation in a continuous space is defined, wherein the weight of a neighbor is related to the spatial distribution of each central point. Researchers propose a kernel point continuous convolution network KPConv, however, the network only comprises one kernel point convolution (KPConv) for each downsampling layer, and cannot realize high-robustness multi-level feature extraction for point clouds with different densities; a local and global encoder network (LGENet) is further proposed for semantic segmentation of airborne LiDAR city point cloud data. The method first extracts features through two-dimensional and three-dimensional kernel point continuous convolution to learn representative geometric information. However, this approach does not take into account the global context information of the unordered point cloud. In addition, the above-described continuous kernel point convolution method fails to consider the imbalance and sparsity of airborne LiDAR point cloud data from both local and global perspectives, and how to perform multi-level feature learning, and thus fails to fully identify fine-grained point cloud semantic features.

In summary, in the prior art, local and global multi-level feature learning is not fully considered in the onboard LiDAR urban point cloud semantic segmentation, and the imbalance of semantic categories is ignored. Therefore, the airborne LiDAR urban point cloud semantic segmentation deep learning method capable of representing global and local features in a multi-layer mode is provided, and the effect of improving semantic segmentation precision is a significant research.

Disclosure of Invention

In view of the above, the present invention provides an airborne LiDAR urban point cloud semantic segmentation method and system of a recursive residual double-attention-kernel convolutional network, and the method and system firstly provide an attention-kernel-convolutional (AKPConv) module, weight channel information by using a batch normalization scale factor, the factor represents the importance of channel weight by using standard deviation, and enhance the local feature representation of point cloud; based on an AKPConv module, a recursive residual error kernel point attention module (RRKA) is provided, and diversified characteristics of neighborhood points are aggregated in an iterative cumulative learning mode; establishing a five-layer coding module by an RRKA module, and then establishing a corresponding five-layer decoding module according to the coding module; finally, in order to fully fuse the cross-layer characteristics of the coding layer and the decoding layer, a global-local channel attention interaction module (GLCA) is provided to fuse the global information and the local information so as to improve the discrimination of point cloud fine segmentation. The present method and system may better acquire and analyze airborne LiDAR city point clouds and may have advantages over other methods in segmenting class imbalance point clouds.

In order to achieve the purpose, the invention provides the following technical scheme:

an airborne LiDAR city point cloud semantic segmentation method of a recursive residual double-attention kernel point convolution network, comprising the following steps: s1: acquiring a point cloud of a target area; s2: preprocessing the acquired point cloud of the target area to acquire training sample data and test sample data; s3: inputting part of the marked samples into a recursive residual error double-attention kernel point convolution network for training; s4: and after the training is finished, performing semantic segmentation on the test sample and obtaining a result.

Further, in step S1, adopt based on carrying on LiDAR' S unmanned aerial vehicle or the collection that the people machine platform realized city point cloud data, specifically include: s11, selecting a target area to be segmented according to task requirements of urban point cloud semantic segmentation, setting flight parameters of an airborne platform, including but not limited to flight height and speed, planning a flight route, and adopting a Z-shaped flight route; s12, setting scanning parameters of the LiDAR, considering the coincidence rate of point clouds, ensuring the scanning accuracy of the point clouds, selecting a laser radar based on non-repetitive scanning, wherein the FOV is 70.4 degrees multiplied by 77.2 degrees, the distance measurement accuracy is 3 cm-1000 cm, and the maximum number of multi-echoes is 480000 points/second; s13, during specific collection, the airborne platform is used for carrying the LiDAR, flying is carried out according to a set scanning air route, and point cloud data of a target area are obtained.

Further, in step S2, preprocessing the acquired point cloud of the target area, mainly including point cloud registration, noise removal, and radiation correction, and then extracting training samples and test samples in blocks from the point cloud data, and labeling semantic labels of the training samples; in particular, in order to reduce the influence of the long tail distribution of the echo intensity of the point cloud data, the robustness of the network is improved by modifying the echo intensity data of the point cloud into normal distribution by using gamma conversion, and the formula is as follows:

/>

wherein I is the acquired echo intensity; i is _γ The echo intensity after gamma correction is in the range of 0 to 255; gamma is a parameter, and the value range of gamma is more than or equal to 0 and less than or equal to 1; by this formula, the raw echo intensities can be mapped to image space.

Further, in step S3, inputting the point cloud training data after gamma correction into an attention kernel convolution block (AKPConv), and learning the point cloud features to obtain the attention kernel convolution features; in the down-sampling process, the point number of the point cloud can be down-sampled by using an AKPConv module; specifically, the attention weight is obtained by adopting a scale factor in a sample normalization layer to obtain a channel attention feature f _c The following:

wherein f is _in Input point cloud sample features; mu.s _B Is f _in Mean value of (a) _B Is f _in The variance of (a); ε is a small constant, avoiding a denominator of 0, set to 1 × 10 ^-5 ；f _out Normalizing point cloud sample characteristics;

is the product of elements; sigmoid () is sigmoid function, i.e., sigmoid (x) = 1/(1 + e) ^-x ) (ii) a Then, the attention feature is input into a kernel convolution formula to obtain a feature F ₁ The formula is as follows:

F ₁ ＝Conv _1×1 (KPConv(f _c ))

wherein, conv _1×1 1 × 1 convolution of representation, KPConv (f) _c ) The expression is the convolution operation of the kernel points, and the specific formula is as follows:

wherein the content of the first and second substances,

is a neighborhood set of point x at a fixed radius R (R ∈ R), i.e., N _x ＝{x _i ∈x|||x _i -x||≤r}，x _i Is an arbitrary subset belonging to point x, f _i Is a point cloud subset x _i Corresponding features, kernel function κ (·) is as follows:

wherein, the first and the second end of the pipe are connected with each other,

representing the position of the spherical nucleation point in 3D space, n _k Denotes the number of core points, W, of the kernel function κ (·) _k Is a weight matrix of the corresponding kernel point; correlation function->

Sigma is a hyper-parameter used for controlling the influence of the distance of the nuclear points; in order to keep the input characteristics, jump connection is added in an AKPConv block; for skipped connections, maxpooling () is an optional max pooling operation when D _in When =2D, it will be used, the operation can be expressed as:

finally, the output characteristic F of AKPConv _AKPConv Can be expressed as:

F _AKPConv ＝ReLU(F ₁ +F ₂ )

where ReLU (x) = max (0, x) denotes an activation function.

Further, in order to accumulate the aggregated local features and generate diversified features, a recursive residual kernel attention module (RRKA) is composed of a recursive point convolution block (RPConv) composed of AKPConv and a single hidden layer multi-layer perception (MLP); the recursive point convolution (RPConv) block is mainly used for learning the accumulated neighborhood characteristics of the point cloud, and the formula is expressed as follows:

wherein

RPConv input for l levels of T recursions, T =1,2.3,. T; AKPConv () is represented as an attention kernel convolution operation, and @>

For the input of RPConv t times, </R>

Is output for the t-1 recursion; to meet the computational efficiency of RPConv, before RPConv is used, MLP is first used to compress the dimension of the feature, and then MLP is used to restore the dimension of the feature; therefore, a RPConv and MLP based compression loop block (CRB) is constructed to improve the calculation efficiency, and the formula is as follows:

x _cout ＝ReLU(BN(W ₂ *RPConv(BN(W ₁ *x _cin ))))

wherein, W ₁ And W ₂ For the learnable parameters of MLP, PRConv () is recursive point convolution operation, BN () is batch normalization operation; on the basis of CRB, a residual error recursive kernel attention (RRKA) module with residual error connection is developed, the module can effectively realize repeated operation on local features, the diversity representation capability of point cloud features is enhanced, and RRKA output can be represented as follows:

x _out ＝CRB(CRB(W ₀ *x _in ))+x _in

wherein W is ₀ Learning parameters, x, for the weights of MLP _in For the input point cloud feature, CRB () is a CRB operation.

Further, after five layers of the coding modules are processed, decoding operation is carried out, and after low-level and high-level features are spliced, a global-local channel attention module (GLCA) is utilized for feature tensor; firstly, a full-connection mode is adopted to realize the channel information fusion of low-level and high-level characteristics under the global space, and the formula x thereof _g The following:

x _g ＝BN(W _g *x _in )

wherein the module has inputs of

N and C are the number of up-sampled point clouds and the dimensions of the features, W _g Full connection weights for fusing low-level and high-level features; in local feature computation, attention weights are obtained using average pooling and one-dimensional convolution:

ω＝sigmoid(W _k *(avgpooling(x _g )))

for average pooling operation of the channels, W _k Is a learnable local one-dimensional kernel convolution weight, with a magnitude of k =5, and is formulated as follows:

from the attention weight ω, the following local attention feature x can be obtained _l ("indicates element-by-element multiplication), the formula is as follows:

output result x of global-local cross-layer information interaction module GLCA _out Expressed as:

x _out ＝x _g +x _l

further, the feature tensor passes through two full-connection layers, semantic segmentation results are obtained through a Sigmoid activation function, a focus loss function is introduced for solving the problem of data imbalance, and the loss function is set as follows:

wherein λ is typically set to 2, α _t Representing class weight parameters, N representing the number of point clouds, p _jc Indicates that the jth sample is includedc probability of class; optimizing model parameters of the semantic segmentation framework by using a stochastic gradient descent method according to the focus loss function, and obtaining a trained semantic segmentation framework after training is completed; and judging the input test sample through the trained semantic segmentation frame, and outputting a semantic segmentation result.

The invention has the beneficial effects that:

the attention kernel point convolution representation module can effectively learn local neighborhood characteristics of the point cloud, weight channel information by using batch normalization scale factors, and represent the importance of channel weight by using standard deviation. The invention provides a recursive residual error kernel attention module which takes an attention kernel convolution module as a key point to mine multi-level point cloud local information and generate low-level semantic features with discriminative power. The global-local channel attention module provided by the invention fuses the up-sampled high-level features and the low-level features connected in a jumper connection, realizes global and local information interaction, and improves the fine point cloud segmentation effect. The invention provides an airborne LiDAR urban point cloud semantic segmentation method RRDAN of a recursive residual error double-attention kernel point convolution network, which is concentrated in multi-level feature representation learning and has strong representation capability on the airborne LiDAR urban point cloud with unbalanced category. Experimental results on two airborne LiDAR urban point cloud data sets show that the performance of the RRDAN is superior to the most advanced airborne LiDAR urban point cloud semantic segmentation method at present.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof.

Drawings

For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of an airborne platform collecting point cloud;

FIG. 3 is a diagram of a recursive residual double attention kernel convolutional network (RRDAN) for on-board LiDAR city point cloud semantic segmentation;

FIG. 4 is a block diagram of the attention kernel convolution module AKPConv of the present invention;

FIG. 5 is a block diagram of the recursive residual error kernel attention module RRKA of the present invention;

FIG. 6 is a block diagram of a global-local channel attention interaction module GLCA in accordance with the present invention;

FIG. 7 is a graph of the results of an RRDAN network of the present invention;

fig. 8 is an error chart of the experimental result of the RRDAN network of the present invention.

Detailed Description

The technical scheme of the invention is explained in detail in the following with the accompanying drawings.

FIG. 1 is a flow chart of a method of the invention, and the invention provides an airborne LiDAR urban point cloud semantic segmentation method and system of a recursive residual error double-attention kernel point convolution network. The deep learning network for semantic segmentation is shown in fig. 3, and can learn fine and representative feature from airborne point cloud. The network consists of an attention kernel convolution module (AKPConv), a recursive residual kernel attention module (RRKA) and a global-local channel attention interaction module (GLCA). Firstly, local features of point cloud data after gamma correction of reflection intensity are obtained through a three-layer attention kernel point convolution module, and then semantic features with multi-level fine representation capability are obtained and cumulatively learned from the local features through an RRKA module. Then, after the stitching of the low-level and high-level features, a global-local channel attention module (GLCA) is applied to the feature tensor to learn the fused semantic information. The attention kernel point convolution module (AKPConv) designed by the invention is used for extracting local features of a point cloud neighborhood, wherein an attention mechanism is embedded for weighting channel features, and a jump connection is added for fusing context information. The recursive residual error kernel point attention module (RRKA) provided by the invention can aggregate diversified characteristics of the neighborhood points in an iterative cumulative learning mode. By a dual attention mechanism of the network, subtle feature representations can be enhanced to improve segmentation performance. The invention provides a multilevel double-attention-core point convolution network which takes key information recursive accumulation learning as a key point to mine the intrinsic information of the airborne point cloud with unbalanced category distribution and generate high-level semantic features with discriminative power.

Specifically, the technical scheme of the invention comprises the following contents:

1. data acquisition: the urban point cloud data acquisition is realized based on an unmanned aerial vehicle or a manned platform carrying LiDAR. Firstly, according to the requirement of a semantic segmentation task of urban point cloud, selecting a target area to be segmented, setting flight parameters of an airborne platform, including flight height, speed and the like, planning a flight route, and generally adopting a Z-shaped flight route. Then, the scanning parameters of the LiDAR are set, the resolution of the point cloud density is considered, the scanning accuracy is guaranteed, a laser radar based on non-repeated scanning is selected, the FOV is 70.4 degrees multiplied by 77.2 degrees, the distance measurement accuracy is 3 cm-1000 cm, and the maximum multi-echo is 480000 points/second. During specific collection, an airborne platform is used for carrying LiDAR, flying is carried out according to a set scanning route, and point cloud data of a target area are obtained, wherein the schematic diagram is shown in figure 2.

2. Data preprocessing: and preprocessing the acquired point cloud of the target area, mainly comprising point cloud registration, noise removal and radiation correction, then extracting training samples and test samples in blocks from point cloud data, and marking semantic labels of the training samples. In particular, in order to reduce the influence of the long tail distribution of the echo intensity of the point cloud data, the robustness of the network is improved by modifying the echo intensity data of the point cloud into normal distribution by using gamma conversion, and the formula is as follows:

wherein I is the acquired echo intensity; i is _γ Is gammaCorrected echo intensity, ranging from 0 to 255; gamma is a parameter, and the value range of gamma is more than or equal to 0 and less than or equal to 1; by this formula, the raw echo intensities can be mapped to image space.

3. The gamma corrected point cloud training data is input to an attention kernel volume block (AKPConv), as shown in fig. 4. Learning point cloud characteristics to obtain attention kernel point convolution characteristics; in the down-sampling, the point number of the point cloud can be down-sampled by using an AKPConv module. Specifically, the attention weight is obtained by using the scale factor in the sample normalization layer to obtain the channel attention feature f _c The following are:

/>

is the product of the elements; sigmoid () is a sigmoid function, i.e., sigmoid (x) = 1/(1 + e) ^-x )。

Then, the attention feature is input into a kernel convolution formula to obtain a feature F ₁ The following are:

F ₁ ＝Conv _1×1 (KPConv(f _c ))

wherein, conv _1×1 Expressed by 1 × 1 convolution, KPConv () expressed by the kernel convolution operation, the specific formula is as follows:

wherein the content of the first and second substances,

is that point x is at a fixed radius r (r ∈ rR) neighborhood set, N _x ＝{x _i ∈x|||x _i -x||≤r}，x _i Is an arbitrary subset belonging to point x, f _i Is a point cloud subset x _i Corresponding to the feature, the formula of the kernel function κ (·) is as follows:

wherein the content of the first and second substances,

representing the position of the spherical nucleation point in 3D space, n _k Denotes the number of core points of the kernel function κ (·), W _k Is a weight matrix of the corresponding kernel point; correlation function>

Sigma is a hyper-parameter used to control the influence of the distance of the epipolar points.

In order to keep the input characteristics, jump connection is added in an AKPConv block; for skipped connections, maxpooling () is an optional max pooling operation. When D is present _in When =2D, it will be used, the operation can be expressed as:

finally, the output characteristic F of AKPConv _AKPConv Can be expressed as:

F _AKPConv ＝ReLU(F ₁ +F ₂ )

where ReLU (x) = max (0, x) denotes an activation function.

4. In order to accumulate aggregate local features and generate diversified features, a recursive residual kernel attention module (RRKA) is composed of a recursive point convolution block (RPConv) composed of AKPConv and multi-layer perception (MLP) of a single hidden layer, as shown in fig. 5. The recursive point convolution (RPConv) block is mainly used for learning the accumulated neighborhood characteristics of the point cloud, and the formula is expressed as follows:

wherein

RPConv input for l levels of T recursions, T =1,2,3, ·, T; AKPConv () is represented as an attention kernel convolution operation, and @>

For the input of RPConv t times, </R>

Is output for the t-1 th recursion.

To meet RPConv computational efficiency, MLP is first employed to compress and then MLP is employed to restore the dimensions of the features before RPConv is used. Therefore, a RPConv and MLP based compression loop block (CRB) is constructed to improve the calculation efficiency, and the formula is as follows:

x _cout ＝ReLU(BN(W ₂ *RPConv(BN(W ₁ *x _cin ))))

wherein, W ₁ And W ₂ For the learnable parameters of MLP, PRConv () is a recursive point convolution operation and BN () is a batch normalization operation.

On the basis of CRB, a residual error recursive kernel attention (RRKA) module with residual error connection is developed, the module can effectively realize repeated operation on local features, the diversity representation capability of point cloud features is enhanced, and RRKA output can be represented as follows:

x _out ＝CRB(CRB(W ₀ *x _in ))+x _in

5. After five layers of the above-mentioned coding modules, a decoding operation is performed, and after the concatenation of the low-level and high-level features, a global-local channel attention module (GLCA) is used for the feature tensor, as shown in fig. 6.

The method adopts a full-connection mode to realize the channel information fusion of low-level and high-level characteristics under the global space, and the formula x thereof _g The following were used:

x _g ＝BN(W _g *x _in )

wherein the module has inputs of

N and C are the number of up-sampled point clouds and the dimensions of the features, W _g To fuse the full connection weights of the lower and upper level features.

In local feature computation, attention weights are obtained using average pooling and one-dimensional convolution:

ω＝sigmoid(W _k *(avgpooling(x _g )))

wherein the content of the first and second substances,

for average pooling operation of the channels, W _k Is a learnable local one-dimensional kernel convolution weight with a magnitude of k =5, whose formula is as follows:

x _l ＝BN(ω⊙x _g )

x _out ＝x _g +x _l

6. the feature tensor passes through two full-connection layers, semantic segmentation results are obtained through a Sigmoid activation function, a focus loss function is introduced for solving the problem of data imbalance, and the loss function is set as follows:

wherein λ is generally set to 2, α _t Representing class weight parameters, N representing the number of point clouds, p _jc Representing the probability that the jth sample falls into the c category. Optimizing the model parameters of the semantic segmentation framework by using a random gradient descent method according to the focus loss function, and obtaining the trained semantic segmentation framework after training is finished; and judging the input test sample through the trained semantic segmentation frame, and outputting a semantic segmentation result.

Fig. 7 shows the experimental result of the RRDAN semantic segmentation network according to the present invention on an open source airborne city point cloud data set ISPRS, and fig. 8 shows an error graph, which shows that 9 classes in the test area are well segmented. The segmentation effect of the present invention can be further illustrated by comparative experiments. Comparing an ISPRS data set by using the method disclosed by the invention with other existing methods such as LUH, RIT _1, alsNet, KPConv, DPE, GANET, DANCE-NET, D-FCN, randlanet, graNet and LGENet, and respectively calculating Overall Accuracy (OA) and average F1 index Avg.F1 as shown in Table 1, wherein the larger the Overall Accuracy OA is, the higher the correct proportion of all results predicted to be correct is; the larger the average F1 index avg. F1, the better the overall evaluation of the results. And each index value of the detection results of different methods is shown in a table l.

Comparison of Table l RRADN with various methods on ISPRS data sets

It can be seen that the method of the present invention achieves the best OA and avg.f1 on this data set. Meanwhile, the performance of the method is superior to that of other airborne LiDAR urban point cloud semantic segmentation methods. The method provided by the invention can obtain better effect, and has advantages in the aspects of automobiles, roofs, facades, short shrubs and trees compared with other methods.

Finally, the above embodiments are only used for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all that should be covered by the claims of the present invention.

Claims

1. An airborne LiDAR city point cloud semantic segmentation method of a recursive residual double-attention kernel point convolution network is characterized by comprising the following steps: the method comprises the following steps:

s1: acquiring a point cloud of a target area;

s2: preprocessing the acquired point cloud of the target area to obtain training sample data and test sample data;

s3: inputting part of the marked samples into a recursive residual double-attention kernel convolution network for training;

s4: and after the training is finished, performing semantic segmentation on the test sample and obtaining a result.

2. The method of claim 1 for on-board LiDAR city point cloud semantic segmentation of a recursive residual double-attention kernel point convolutional network, comprising: in step S1, the method adopts an unmanned aerial vehicle or a manned platform carrying LiDAR to realize the collection of the urban point cloud data, and specifically comprises the following steps: s11, selecting a target area to be segmented according to task requirements of urban point cloud semantic segmentation, setting flight parameters of an airborne platform, including but not limited to flight height and speed, planning a flight route, and adopting a Z-shaped flight route; s12, setting scanning parameters of the LiDAR, considering the coincidence rate of point clouds, ensuring the scanning accuracy of the point clouds, selecting a laser radar based on non-repetitive scanning, wherein the FOV is 70.4 degrees multiplied by 77.2 degrees, the distance measurement accuracy is 3 cm-1000 cm, and the maximum number of multi-echoes is 480000 points/second; and S13, during specific acquisition, carrying the LiDAR by using the airborne platform, flying according to a set scanning route, and acquiring point cloud data of a target area.

3. The method of claim 2, wherein the method for semantic segmentation of airborne LiDAR urban point clouds in a recursive residual double-attention-kernel convolutional network comprises: in the step S2, preprocessing the acquired point cloud of the target area, mainly comprising point cloud registration, noise removal and radiation correction, then extracting training samples and test samples in blocks from point cloud data, and marking semantic labels of the training samples; in particular, in order to reduce the influence of the long tail distribution of the echo intensity of the point cloud data, the robustness of the network is improved by modifying the echo intensity data of the point cloud into normal distribution by using gamma conversion, and the formula is as follows:

4. The method of claim 3 for on-board LiDAR city point cloud semantic segmentation of a recursive residual double-attention kernel point convolutional network, comprising: in step S3, inputting the point cloud training data after gamma correction into an attention kernel convolution block (AKPConv), and learning the point cloud characteristics to obtain the attention kernel convolution characteristics; in the down-sampling process, the point number of the point cloud can be down-sampled by using an AKPConv module; specifically, the attention weight is obtained by adopting a scale factor in a sample normalization layer to obtain a channel attention feature f _c The following are:

wherein, f _in Input point cloud sample features;

is f _in In the mean value of (a)>

Is f _in The variance of (a); ε is a small constant, avoiding a denominator of 0, set to 1 × 10 ^-5 ；f _out Normalizing point cloud sample characteristics; />

Is the product of the elements; sigmoid () is sigmoid function, i.e., sigmoid (x) = 1/(1 + e) ^-x ) (ii) a Then, the attention feature is input into a kernel convolution formula to obtain a feature F ₁ The formula is as follows: />

F ₁ ＝Conv _1×1 (KPConv(f _c ))

wherein the content of the first and second substances,

is a neighborhood set of point x at a fixed radius R (R ∈ R), i.e., N _x ＝{x _i ∈x∣||x _i -x||≤r}，x _i Is an arbitrary subset belonging to point x, f _i Is a point cloud subset x _i Corresponding to the feature, the formula of the kernel function κ (·) is as follows:

wherein the content of the first and second substances,

representing the position of the spherical nucleation point in 3D space, n _k Denotes the number of core points of the kernel function κ (·), W _k Is a weight matrix of the corresponding kernel point; correlation function->

finally, the output characteristic F of AKPConv _AKPConv Can be expressed as:

F _AKPConv ＝ReLU(F ₁ +F ₂ )

where ReLU (x) = max (0, x) denotes an activation function.

5. The method of claim 4 for on-board LiDAR city point cloud semantic segmentation of a recursive residual double-attention kernel point convolutional network, comprising: in order to accumulate the aggregate local features and generate diversified features, a recursive residual kernel attention module (RRKA) is formed by a recursive point convolution block (RPConv) formed by AKPConv and multilayer perception (MLP) of a single hidden layer; the recursive point convolution (RPConv) block is mainly used for learning the accumulated neighborhood characteristics of the point cloud, and the formula is expressed as follows:

wherein

RPConv input for l levels of T recursions, T =1,2,3, \8230;, T; AKPConv () is represented as an attention kernel convolution operation, based on a convolution operation performed on a plurality of pixels, and based on a convolution operation performed on a plurality of pixels>

For the input of RPConv t times, </R>

x _cout ＝ReLU(BN(W ₂ *RPConv(BN(W ₁ *x _cin ))))

x _out ＝CRB(CRB(W ₀ *x _in ))+x _in

wherein W is ₀ Learning parameters, x, for the weights of the MLP _in For the input point cloud feature, CRB () is a CRB operation.

6. The method of claim 5 for on-board LiDAR city point cloud semantic segmentation of a recursive residual double-attention kernel point convolutional network, comprising: decoding operation is carried out after the five layers of coding modules, and after low-level and high-level features are spliced, a global-local channel attention module (GLCA) is utilized for feature tensor; firstly, a full-connection mode is adopted to realize the channel information fusion of low-level and high-level characteristics under the global space, and the formula x thereof _g The following were used:

x _g ＝BN(W _g *x _in )

wherein the module has inputs of

ω＝sigmoid(W _k *(avgpooling(x _g )))

wherein the content of the first and second substances,

from the attention weight ω, the following local attention feature x can be obtained _l (. All indicate element-by-element multiplication), the formula is as follows:

x _l ＝BN(ω⊙x _g )

x _out ＝x _g +x _l。

7. the method of claim 6, wherein the method comprises an on-board LiDAR urban point cloud semantic segmentation method using a recursive residual double-attention-kernel convolutional network, wherein the method comprises: the feature tensor passes through two full-connection layers, semantic segmentation results are obtained through a Sigmoid activation function, a focus loss function is introduced for solving the problem of data imbalance, and the loss function is set as follows:

where λ is typically set to 2, α _t Representing class weight parameters, N representing the number of point clouds, p _jc Representing the probability that the jth sample falls into the c category; optimizing the model parameters of the semantic segmentation framework by using a random gradient descent method according to the focus loss function, and obtaining the trained semantic segmentation framework after training is finished; the input test sample is distinguished through a trained semantic segmentation frame,and outputting a semantic segmentation result.

8. An airborne LiDAR urban point cloud semantic segmentation system of a recursive residual double-attention-kernel convolutional network is characterized in that: the system employs the method of any one of claims 1 to 8.