CN112215231B

CN112215231B - Large-scale point cloud semantic segmentation method combining spatial depth convolution and residual error structure

Info

Publication number: CN112215231B
Application number: CN202011048758.4A
Authority: CN
Inventors: 刘盛; 黄圣跃; 程豪豪; 沈家瑜; 陈胜勇
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2024-03-08
Anticipated expiration: 2040-09-29
Also published as: CN112215231A

Abstract

The invention discloses a large-scale point cloud semantic segmentation method combining a spatial depth convolution and a residual structure, which comprises the following steps: s1, constructing a semantic segmentation model; s2, acquiring point cloud data of a preset scene to obtain a point set P= { P ₁ ，p ₂ ，…，p _i ，…，p _N Sum feature set f= { F ₁ ，f ₂ ，…，f _i ，…，f _N P, where _i And f _i Respectively three-dimensional coordinates and characteristics of an ith point in the point cloud, wherein N is the number of points in the point cloud; s3, inputting the point set and the feature set into a semantic segmentation model; s4, obtaining the probability of each point in the semantic segmentation model output point cloud; s5, selecting the classification with the maximum probability of each point as a prediction label, and obtaining a point cloud segmentation result of a preset scene according to the prediction label. According to the method, the space depth volume and residual structure are combined to perform point cloud semantic segmentation, so that the memory consumption and the calculation consumption are reduced, the precision is improved, and the large-scale scene point cloud can be rapidly and effectively processed at one time.

Description

Large-scale point cloud semantic segmentation method combining spatial depth convolution and residual error structure

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a large-scale point cloud semantic segmentation method combining a spatial depth convolution and a residual structure.

Background

At present, the point cloud semantic segmentation technology based on deep learning is rapidly developed in recent years, but a plurality of methods in the prior art have the following technical defects: the memory consumption is overlarge, large-scale scene point clouds cannot be directly processed at one time, and blocking processing is needed; the calculation consumption is too large, and the scene point cloud cannot be rapidly subjected to semantic segmentation; the precision is lower, and the insufficient semantic information is caused by insufficient depth of a constructed network structure and insufficient receptive field.

Disclosure of Invention

Aiming at the problems, the invention provides a large-scale point cloud semantic segmentation method combining a spatial depth convolution and a residual structure, which reduces memory consumption and calculation consumption, improves precision and can rapidly and effectively process large-scale scene point clouds at one time.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

the invention provides a large-scale point cloud semantic segmentation method combining a spatial depth convolution and a residual structure, which comprises the following steps:

s1, constructing a semantic segmentation model, wherein the semantic segmentation model comprises a plurality of stages, each stage sequentially comprises an encoder and a decoder from an input side to an output side, a linear layer for expanding dimension is further arranged in front of the encoder of the first stage, and a segHead is further arranged behind the decoder of the first stage; the encoder of each stage combines the spatial depth convolution and residual structure to encode the input characteristics, strengthen semantic information and obtain the output characteristics of the encoder; the output characteristics of the encoders at other stages except the final stage are transmitted to the encoder at the next stage after passing through the downsampling layer; the output characteristics of the encoder of each stage and the output characteristics of the decoder of the next stage are input into the decoder of the stage together after passing through an up-sampling layer; the output characteristics of the first stage decoder output the probability that each point in the point cloud belongs to each class through the SegHead;

s2, acquiring point cloud data of a preset scene to obtain a point set P= { P ₁ ，p ₂ ，…，p _i ，…，p _N Feature set f= { F corresponding to point set ₁ ，f ₂ ，…，f _i ，…，f _N P, where _i And f _i Respectively three-dimensional coordinates and characteristics of an ith point in the point cloud, wherein N is the number of points in the point cloud;

s3, inputting the point set and the feature set into a semantic segmentation model;

s4, obtaining the probability of each point in the semantic segmentation model output point cloud;

s5, selecting the classification with the maximum probability of each point as a prediction label, and obtaining a point cloud data segmentation result of the preset scene according to the prediction label.

Preferably, the encoder comprises at least one SDR block, optionally with a DFA module added to the encoder, the DFA module being located on the input side of the SDR block.

Preferably, the input features of the DFA module sequentially pass through the downsampling layer, the linear layer, the activation function, the SDC operation, the linear layer, the activation function and the upsampling layer from the input side to the output side to obtain DFA intermediate features, the input features of the DFA module are further combined with the DFA intermediate features, and the output features of the DFA module are obtained through the linear layer and the activation function.

Preferably, the sampling mode of the up-sampling layer is nearest neighbor interpolation.

Preferably, the input features of the SDR block sequentially pass through the linear layer, the SDC operation, the linear layer and the linear layer from the input side to the output side to obtain intermediate features of the SDR block, and the input features of the SDR block are added with the intermediate features of the SDR block through shortcut and then the output features of the SDR block are obtained through an activation function.

Preferably, when the dimensions of the input feature and the output feature of the SDR block are equal, the input feature of the SDR block is shortcut; when the dimensions of the input features and the output features of the SDR block are unequal, the input features of the SDR block are expanded by the linear layer to obtain shortcut.

Preferably, the normalization process or activation function may be selectively added after the linear layer or SDC operation.

Preferably, the SDC operates as a spatial depth convolution, the feature f of the ith point after the convolution _i The' formula is as follows:

wherein p is _i And p _j Three-dimensional coordinates, Δp, of the i-th point and the j-th point, respectively _i，j ＝p _i -p _j Sigma is the data Δp _i，j Variance of f _j Characteristic of the jth point in the point cloud, Ω _i Is the ith point in the point cloudNeighbor index set, j is Ω _i Is a component of the group.

Preferably, the decoder of the last stage includes a linear layer, and the decoder of the other stages includes a Concat function and a linear layer in order from the input side to the output side, the Concat function combining the output characteristics of the encoder of the same stage with the output characteristics of the decoder of the next stage, and the linear layer being followed by a selective increase of the activation function.

Preferably, the SegHead comprises three linear layers and one softmax layer in order from the input side to the output side.

Compared with the prior art, the invention has the beneficial effects that: the space depth volume and residual structure are combined to perform point cloud semantic segmentation, so that memory consumption and calculation consumption are reduced, accuracy is improved, and large-scale scene point clouds can be rapidly and effectively processed at one time.

Drawings

FIG. 1 is a semantic segmentation model of the present invention;

FIG. 2 is a sample of an input point cloud according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a DFA module according to the present invention;

FIG. 4 is a schematic diagram of an SDR block of the present invention;

fig. 5 is a graph of an input point cloud sample segmentation result according to an embodiment of the present invention.

Detailed Description

The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

It is noted that unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

As shown in fig. 1-5, a large-scale point cloud semantic segmentation method combining a spatial depth convolution and a residual structure includes the following steps:

s1, constructing a semantic segmentation model, wherein the semantic segmentation model comprises a plurality of stages, each stage sequentially comprises an encoder and a decoder from an input side to an output side, a linear layer for expanding dimension is further arranged in front of the encoder of the first stage, and a segHead is further arranged behind the decoder of the first stage; the encoder of each stage combines the spatial depth convolution and residual structure to encode the input characteristics, strengthen semantic information and obtain the output characteristics of the encoder; the output characteristics of the encoders at other stages except the final stage are transmitted to the encoder at the next stage after passing through the downsampling layer; the output characteristics of the encoder of each stage and the output characteristics of the decoder of the next stage are input into the decoder of the stage together after passing through an up-sampling layer; the output characteristics of the first stage decoder pass through the SegHead to output the probability that each point in the point cloud belongs to each class.

The semantic segmentation model SDRNet is a hierarchical structure, the network structure of the semantic segmentation model SDRNet can be divided into a plurality of stages, each stage comprises an encoder and a decoder, a linear layer for expanding dimension is further included in front of the encoder in the first stage, and the semantic segmentation model SDRNet further comprises a segHead for outputting probability that each point in the point cloud belongs to each class after the decoder. The encoder of each stage combines the spatial depth convolution and residual structure to encode the input characteristics, strengthen semantic information and obtain the output characteristics of the encoder; the output characteristics of the encoders at other stages except the final stage are transmitted to the encoder at the next stage after passing through the downsampling layer; the output characteristics of the encoder of each stage and the output characteristics of the decoder of the next stage are input into the decoder of the stage together after passing through an up-sampling layer; the output characteristics of the first stage decoder pass through the SegHead to output the probability that each point in the point cloud belongs to each class.

Further, as shown in fig. 1, a four-stage semantic segmentation model SDRNet is constructed in this example, which is a large-scale point cloud semantic segmentation model combining spatial depth convolution and residual structure, and is beneficial to the balance between speed and precision. The method comprises the steps that firstly, the input features of a semantic segmentation model are expanded to 32 through a linear layer and then are input into a first-stage encoder, the output features of the first-stage encoder are directly transmitted to a first-stage decoder, the output features of the first-stage encoder are directly transmitted to a second-stage encoder after passing through a downsampling layer, the output features of the second-stage encoder are directly transmitted to a second-stage decoder, the output features of the second-stage encoder are directly transmitted to a third-stage encoder after passing through the downsampling layer, the output features of the third-stage encoder are directly transmitted to a third-stage decoder, and the output features of the third-stage encoder are directly transmitted to a fourth-stage encoder after passing through the downsampling layer; the decoder of the fourth stage transmits the output characteristic of the fourth stage encoder to the decoder of the third stage through an up-sampling layer, the decoder of the third stage carries out first merging of the output characteristic of the third stage encoder and the output characteristic of the fourth stage decoder, the output characteristic of the first stage encoder and the output characteristic of the third stage decoder are transmitted to the decoder of the second stage through the up-sampling layer after the first merging of the Concat, the output characteristic of the second stage encoder and the output characteristic of the third stage decoder carry out second merging of the Concat, the decoder of the first stage carries out third merging of the output characteristic of the first stage encoder and the output characteristic of the second stage decoder, the output characteristic of the first stage decoder outputs the output characteristic of the first stage decoder after the third merging of the Concat, and the output characteristic of the first stage decoder carries out probability that each point belongs to each class in the SegHead output point cloud. It should be noted that, according to the actual requirement, the SDRNet may also construct a network structure at any stage.

In one embodiment, the encoder includes at least one SDR block, and optionally a DFA module is added to the encoder, the DFA module being located on the input side of the SDR block.

Wherein the encoder comprises at least one SDR block, and a DFA module can be selectively added in the encoder, and the DFA module is positioned on the input side of the SDR block. In the encoder, the input features are encoded to output new features. The output characteristics of the encoder are downsampled and transferred to the encoder of the next stage except for the last stage. In this embodiment, the encoder in the first stage of the semantic segmentation model SDRNet includes a DFA module and two SDR blocks, the encoder in the second stage includes a DFA module and three SDR blocks, the encoder in the third stage includes a DFA module and six SDR blocks, and the encoder in the fourth stage includes three SDR blocks. It should be noted that the number of DFA modules and SDR blocks in each stage may also be adjusted according to the actual situation.

In an embodiment, the input features of the DFA module sequentially pass through the downsampling layer, the linear layer, the activation function, the SDC operation, the linear layer, the activation function, and the upsampling layer from the input side to the output side to obtain DFA intermediate features, the input features of the DFA module are further combined with the DFA intermediate features, and the output features of the DFA module are obtained through the linear layer and the activation function.

The DFA module comprises a downsampling layer, a linear layer, SDC operation, an upsampling layer and an activating function, wherein the linear layer and the activating function are multiple. The DFA module can also be selectively added with normalization processing after the operation of a linear layer or SDC, thereby helping to accelerate the convergence speed of the training stage and preventing overfitting. Further, as shown in fig. 3, N is the number of points in the point cloud, L is the linear layer, BN is the normalization processing batch normal, LR is the activation function leack Relu, and I is the dimension of the input feature. The input features of the DFA module are subjected to downsampling layers to obtain N/4 points in the point cloud as first DFA intermediate features, the first DFA intermediate features are sequentially subjected to linear layer L, normalization BN and an activation function LR to reduce the dimensionality of the input features to be I/2, and second DFA intermediate features are obtained, so that calculation consumption and memory consumption in subsequent convolution are further reduced. The second DFA intermediate feature then obtains a third DFA intermediate feature through SDC operations and normalization process BN in sequence. The third DFA intermediate feature is recombined after sequentially passing through the linear layer L, the normalization BN and the activation function LR without changing the dimension, and then the fourth DFA intermediate feature is obtained. And the dimensions in the operation process of the second DFA intermediate feature, the third DFA intermediate feature and the fourth DFA intermediate feature are I/2, the fourth DFA intermediate feature is restored to the original dimension through an upsampling layer, namely the number of points in the point cloud is N, so as to obtain a fifth DFA intermediate feature, the fifth DFA intermediate feature is combined with the input feature of the DFA module (combined through a Concat function), and then the fifth DFA intermediate feature is sequentially restored to the dimension I of the input feature through a linear layer L, normalization BN and an activation function LR, so that the output feature of the DFA module is obtained. It should be noted that the activation function in the DFA module may also use the activation function Relu.

In one embodiment, the upsampling layer samples in a nearest neighbor interpolation.

The sampling mode of the up-sampling layer is nearest neighbor interpolation, and the up-sampling layer is suitable for large-scale point clouds, and is minimum in calculated amount and highest in speed.

The sampling mode of the downsampling layer is random sampling, 1/4 part of the point cloud of the current stage is randomly selected as the point cloud of the next stage, sixteen neighbors in the original point cloud are found for each point after downsampling by using a KD-Tree algorithm during downsampling, feature information is fused through the maximum pooling operation, and noise is restrained.

In an embodiment, the input features of the SDR block sequentially pass through the linear layer, the SDC operation, the linear layer and the linear layer from the input side to the output side to obtain the intermediate features of the SDR, and the input features of the SDR block are further added to the intermediate features of the SDR through the shortcut and then the output features of the SDR block are obtained through the activation function.

The SDR block is a spatial depth residual block, and mainly combines spatial depth convolution and residual structure, and comprises three linear layers and one SDC operation. Normalization or activation functions may also be optionally added to the SDR block after linear layer or SDC operation. Further, as shown in fig. 4, N is the number of points in the point cloud, I is the dimension of the input feature, O is the dimension of the output feature, L is the linear layer, BN is the normalization processing batch normal, and LR is the activation function leack Relu. The input features of the SDR block firstly sequentially pass through a linear layer L, a normalization processing BN and an activating function LR to obtain a first SDR intermediate feature, the channel number of the first SDR intermediate feature is O/4, the calculation consumption and the memory consumption of subsequent convolution operation can be further reduced, the first SDR intermediate feature sequentially passes through an SDC operation and the normalization processing BN to obtain a second SDR intermediate feature, the second SDR intermediate feature sequentially passes through the linear layer L, the normalization processing BN and the activating function LR to obtain a third SDR intermediate feature, the third intermediate feature recombines the convolved second SDR intermediate feature without changing the dimension, the channel numbers of the first SDR intermediate feature, the second SDR intermediate feature and the third SDR intermediate feature are O/4, the third SDR intermediate feature sequentially passes through the linear layer L, the normalization processing BN and the activating function LR to obtain a fourth SDR intermediate feature, the dimension of the output feature is expanded by the linear layer L, the fourth SDR intermediate feature and the shttcut is added, and the output feature of the block is obtained by the activating function LR, and the problem of disappearance can be relieved.

In an embodiment, when the dimensions of the input feature and the output feature of the SDR block are equal, the input feature of the SDR block is shortcut; when the dimensions of the input features and the output features of the SDR block are unequal, the input features of the SDR block are expanded by the linear layer to obtain shortcut.

The shortcut has two cases, and when the I and O of the SDR block are equal, the input characteristic can be directly used as the shortcut; when the I and O of the SDR block are unequal, the input feature is used as a shortcut after the O dimension feature is obtained through a linear layer L. It should be noted that the activation function of the SDR block may also use an activation function Relu, etc.

In one embodiment, the normalization process or activation function may be selectively added after the linear layer or SDC operation.

The selective addition of normalization processing after the operation of the linear layer or SDC can accelerate convergence and prevent overfitting, and nonlinear factors can be introduced through activating functions, so that the expression capacity of the model is improved.

In one embodiment, the SDC operates as a spatial depth convolution, the feature f of the ith point after the convolution _i The' formula is as follows:

wherein p is _i And p _j Three-dimensional coordinates, Δp, of the i-th point and the j-th point, respectively _i，j ＝p _i -p _j Sigma is the data Δp _i，j Variance of f _j Is the characteristic of the jth point in the point cloud，Ω _i Is the neighbor index set of the ith point in the point cloud, j is omega _i Is a component of the group.

Wherein the SDC operates as a spatial depth convolution. Assume that there is a continuous function g (.): r is R ³ →R ^I Parameters of the convolution kernel at any position in space can be obtained, wherein I is the dimension of the input feature. Convoluting on the three-dimensional point cloud through a function g (·), and obtaining the characteristic f of the ith point after convolution through SDC operation _i The formula of':

wherein p is _i And p _j Three-dimensional coordinates, Δp, of the i-th point and the j-th point, respectively _i，j ＝p _i -p _j Sigma is the data Δp _i，j Variance of f _j Characteristic of the jth point in the point cloud, Ω _i Is the neighbor index set of the ith point in the point cloud, j is omega _i Is a component of the group. Omega shape _i The method can be obtained through a KD-Tree algorithm, which is a conventional technology well known to those skilled in the art and is not described herein. In this example g (-) is a linear layer and the Relu activation function, learned in training.

In particular, for Lei Dadian clouds, Δp is first of all _i，j Rotating to obtain Δp' _i，j ：

Wherein the three-dimensional coordinates of the ith point in the point cloud are (x _i ，y _i ，z _i ) Theta is then _i ＝tan ^-1 (y _i /x _i ). Thereby obtaining SDC operation for radar point cloud, and obtaining characteristic f of ith point after convolution _i ' formula:

in an embodiment, the decoder of the last stage includes a linear layer, the decoder of the other stages includes a Concat function and a linear layer sequentially from the input side to the output side, the Concat function combines the output characteristics of the encoder of the same stage with the output characteristics of the decoder of the next stage, and the linear layer is followed by a selective increase of the activation function.

Wherein, as shown in fig. 1, the decoder of the last stage includes a linear layer, and the decoders of the other stages include a Concat function operation and a linear layer. Except for the last stage, the decoder inputs the output characteristics of the encoder of the same stage and the output characteristics of the decoder of the next stage, namely, the decoder of the current stage carries out up-sampling on the output characteristics of the decoder of the next stage through an up-sampling layer, and the up-sampled characteristics and the output characteristics of the encoder of the current stage are input into a Concat function together for combination. In this embodiment, the input features of the current stage decoder are passed through a linear layer to obtain the output features. The decoder of the last stage directly inputs the output characteristics of the encoder of the same stage, and then obtains the output characteristics through a linear layer. In addition, the output characteristics of the decoder are transferred to the decoder of the previous stage except for the first stage. It should be noted that, the activation functions may be selectively added to each stage, for example, the decoder of the last stage includes a linear layer and an activation function, the decoder of the other stages includes a Concat function, a linear layer and an activation function, and the input features of the decoder of the current stage sequentially pass through a linear layer and an activation function to obtain the output features. The decoder of the last stage directly inputs the output characteristics of the encoder of the same stage, and then sequentially passes through a linear layer and an activation function to obtain the output characteristics. The output characteristics of the decoders of the other stages, except the first stage, are transferred to the decoder of the previous stage.

In one embodiment, the SegHead includes three linear layers and one softmax layer in order from the input side to the output side.

The SegHead comprises three linear layers and one softmax layer from the input side to the output side. The dimensions of the three linear layers are 64, 32, 19 in sequence. The features of each point pass through three linear layers and one softmax layer in turn, outputting the probability of each class.

S2, acquiring point cloud data of a preset scene to obtain a point set P= { P ₁ ，p ₂ ，…，p _i ，…，p _N Feature set f= { F corresponding to point set ₁ ，f ₂ ，…，f _i ，…，f _N P, where _i And f _i The three-dimensional coordinates and the features of the ith point in the point cloud are respectively, and N is the number of points in the point cloud. The scene point cloud sample acquired by the embodiment is shown in fig. 2.

S3, inputting the point set and the feature set into the semantic segmentation model.

S4, obtaining the probability of each point in the semantic segmentation model output point cloud.

If the trained network parameters exist, the semantic segmentation model directly adopts the trained network parameters, otherwise, the network parameters of the semantic segmentation model are trained first. The training steps are as follows:

a) And obtaining a point cloud data set with semantic labels, and dividing the point cloud into a training set and a verification set. In this embodiment, the SemanticKITTI Lei Dadian cloud data set is downloaded, the SemanticKI TTI sequences 00-07 and 09-10 are used as training sets, and the sequence 08 is used as verification set;

b) Randomly selecting point clouds of a plurality of scenes in the training set, inputting the point clouds into the semantic segmentation model SDRNet in the step S1, and outputting the probability of each point in the point clouds, wherein six scenes are selected in the embodiment;

c) Removing the unlabeled dots;

d) Carrying out one-hot coding on the label;

e) Calculating a loss function, in this example using the WCE loss function;

f) Optimizing network parameters by using an Adam optimizer;

g) Training on the training set a predetermined number of times goes to step h), otherwise go to step b). In this embodiment, a complete training round on the SemanticKITTI training set goes to step h), otherwise go to step b);

h) Evaluating model precision on the verification set, wherein the evaluation index is equal to the cross ratio mIoU, and if the model precision is not saved or is higher than the saved model precision, saving network parameters;

i) If the training reaches 100 rounds, ending the training, otherwise turning to the step b).

According to the method and the device, the semantic segmentation is carried out on the large-scale scene point cloud by constructing the multi-stage semantic segmentation model SDRNet, so that the point cloud information in the scene can be extracted efficiently, and a high-precision semantic segmentation result is obtained. Further, in the embodiment of the application, by constructing the four-stage semantic segmentation model SDRNet to carry out semantic segmentation on the large-scale scene point cloud, different categories can be segmented in the point cloud, the unmanned vehicle is helped to analyze the environment in the scene, the high precision is maintained, the calculation speed is high, and the efficiency is further improved. Further, in the unmanned vehicle application scene, the unmanned vehicle acquires point cloud data in the traveling scene through a radar (such as a velodyne64 laser radar) in the traveling process, a point set and a feature set of the point cloud are obtained, the probability of each point in the point cloud is output after the point set and the feature set are input into a semantic segmentation model SDRNet, the classification with the maximum probability of each point is selected as a prediction label, and a unmanned vehicle traveling scene point cloud segmentation result is obtained according to the prediction label. As shown in fig. 5, the semantic segmentation result diagram of the unmanned scene point cloud is that different gray scales in the diagram represent different categories, and gray scale colors from light to dark represent plants, floors, buildings, people and vehicles respectively. The plant can be further subdivided into trunks, grasslands and others, the ground can be divided into roads, sidewalks, parking spaces and others, the building can be divided into houses, fences, posts, traffic signs and others, the person can be divided into pedestrians, cyclists and motorcyclists, and the vehicle can be divided into cars, bicycles, motorcycles, trucks and others.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples merely represent a more specific and detailed description of the present application and are not therefore to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A large-scale point cloud semantic segmentation method combining a spatial depth convolution and a residual structure is characterized by comprising the following steps of: the large-scale point cloud semantic segmentation method combining the spatial depth convolution and the residual error structure comprises the following steps of:

s1, constructing a semantic segmentation model, wherein the semantic segmentation model comprises a plurality of stages, each stage sequentially comprises an encoder and a decoder from an input side to an output side, a linear layer for expanding dimension is further arranged in front of the encoder of a first stage, and a segHead is further arranged behind the decoder of the first stage; the encoder in each stage encodes the input characteristics by combining the spatial depth convolution and the residual error structure, strengthens semantic information and obtains the output characteristics of the encoder; the output characteristics of the encoders at other stages except the final stage are transmitted to the encoder at the next stage after passing through the downsampling layer; the output characteristics of the encoder of each stage and the output characteristics of the decoder of the next stage are input into the decoder of the stage together after passing through an up-sampling layer; the output characteristics of the first stage decoder pass through the probability that each point in the SegHead output point cloud belongs to each class;

the encoder comprises at least one SDR block, wherein a DFA module can be selectively added in the encoder, and the DFA module is positioned on the input side of the SDR block;

the input features of the DFA module are sequentially subjected to a downsampling layer, a linear layer, an activation function, SDC operation, the linear layer, the activation function and an upsampling layer from the input side to the output side to obtain intermediate features of the DFA module, the input features of the DFA module are combined with the intermediate features of the DFA module, and the output features of the DFA module are obtained through the linear layer and the activation function;

the input features of the SDR block sequentially pass through a linear layer, SDC operation, a linear layer and a linear layer from an input side to an output side to obtain intermediate features of the SDR block, and the input features of the SDR block are added with the intermediate features of the SDR block through shortcut and then the output features of the SDR block are obtained through an activation function;

the decoder of the last stage comprises a linear layer, the decoders of other stages sequentially comprise a Concat function and a linear layer from an input side to an output side, the Concat function combines the output characteristics of the encoder of the same stage with the output characteristics of the decoder of the next stage, and an activation function can be selectively added after the linear layer;

s2, acquiring point cloud data of a preset scene to obtain a point set P= { P ₁ ，p ₂ ，…，p _i ，…，p _N Feature set f= { F corresponding to the point set } ₁ ，f ₂ ，…，f _i ，…，f _N P, where _i And f _i Respectively three-dimensional coordinates and characteristics of an ith point in the point cloud, wherein N is the number of points in the point cloud;

s3, inputting the point set and the feature set into the semantic segmentation model;

2. The large scale point cloud semantic segmentation method combining spatial depth convolution and residual structure according to claim 1, wherein: the sampling mode of the up-sampling layer is nearest neighbor interpolation.

3. The large scale point cloud semantic segmentation method combining spatial depth convolution and residual structure according to claim 1, wherein: when the dimensions of the input feature and the output feature of the SDR block are equal, the input feature of the SDR block is the shortcut; and when the dimensionalities of the input features and the output features of the SDR block are unequal, the input features of the SDR block are expanded by a linear layer to obtain the shortcut.

4. A large scale point cloud semantic segmentation method combining spatial depth convolution and residual structure as claimed in claim 1 or 3, wherein: the linear layer or SDC may be operated upon to selectively add normalization processes or activation functions.

5. The large scale point cloud semantic segmentation method combining spatial depth convolution and residual structure according to claim 4, wherein: the SDC operation is spatial depth convolution, and the characteristic f 'of the ith point after convolution' _i The formula is as follows:

wherein p is _i And p _j Three-dimensional coordinates, Δp, of the i-th point and the j-th point, respectively _i,j ＝p _i -p _j Sigma is the data Δp _i,j Variance of f _j Characteristic of the jth point in the point cloud, Ω _i Is the neighbor index set of the ith point in the point cloud, j is omega _i The elements in (a) and the continuous function g (R) ³ →R ^I I.e. for combining three-dimensional vectors R ³ Conversion to an I-dimensional vector R ^I I is the dimension of the input feature of the SDC operation.

6. The large scale point cloud semantic segmentation method combining spatial depth convolution and residual structure according to claim 1, wherein: the SegHead comprises three linear layers and one softmax layer in order from the input side to the output side.