CN112215231B - Large-scale point cloud semantic segmentation method combining spatial depth convolution and residual error structure - Google Patents

Large-scale point cloud semantic segmentation method combining spatial depth convolution and residual error structure Download PDF

Info

Publication number
CN112215231B
CN112215231B CN202011048758.4A CN202011048758A CN112215231B CN 112215231 B CN112215231 B CN 112215231B CN 202011048758 A CN202011048758 A CN 202011048758A CN 112215231 B CN112215231 B CN 112215231B
Authority
CN
China
Prior art keywords
point cloud
point
stage
semantic segmentation
sdr
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011048758.4A
Other languages
Chinese (zh)
Other versions
CN112215231A (en
Inventor
刘盛
黄圣跃
程豪豪
沈家瑜
陈胜勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202011048758.4A priority Critical patent/CN112215231B/en
Publication of CN112215231A publication Critical patent/CN112215231A/en
Application granted granted Critical
Publication of CN112215231B publication Critical patent/CN112215231B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a large-scale point cloud semantic segmentation method combining a spatial depth convolution and a residual structure, which comprises the following steps: s1, constructing a semantic segmentation model; s2, acquiring point cloud data of a preset scene to obtain a point set P= { P 1 ,p 2 ,…,p i ,…,p N Sum feature set f= { F 1 ,f 2 ,…,f i ,…,f N P, where i And f i Respectively three-dimensional coordinates and characteristics of an ith point in the point cloud, wherein N is the number of points in the point cloud; s3, inputting the point set and the feature set into a semantic segmentation model; s4, obtaining the probability of each point in the semantic segmentation model output point cloud; s5, selecting the classification with the maximum probability of each point as a prediction label, and obtaining a point cloud segmentation result of a preset scene according to the prediction label. According to the method, the space depth volume and residual structure are combined to perform point cloud semantic segmentation, so that the memory consumption and the calculation consumption are reduced, the precision is improved, and the large-scale scene point cloud can be rapidly and effectively processed at one time.

Description

Large-scale point cloud semantic segmentation method combining spatial depth convolution and residual error structure
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a large-scale point cloud semantic segmentation method combining a spatial depth convolution and a residual structure.
Background
At present, the point cloud semantic segmentation technology based on deep learning is rapidly developed in recent years, but a plurality of methods in the prior art have the following technical defects: the memory consumption is overlarge, large-scale scene point clouds cannot be directly processed at one time, and blocking processing is needed; the calculation consumption is too large, and the scene point cloud cannot be rapidly subjected to semantic segmentation; the precision is lower, and the insufficient semantic information is caused by insufficient depth of a constructed network structure and insufficient receptive field.
Disclosure of Invention
Aiming at the problems, the invention provides a large-scale point cloud semantic segmentation method combining a spatial depth convolution and a residual structure, which reduces memory consumption and calculation consumption, improves precision and can rapidly and effectively process large-scale scene point clouds at one time.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
the invention provides a large-scale point cloud semantic segmentation method combining a spatial depth convolution and a residual structure, which comprises the following steps:
s1, constructing a semantic segmentation model, wherein the semantic segmentation model comprises a plurality of stages, each stage sequentially comprises an encoder and a decoder from an input side to an output side, a linear layer for expanding dimension is further arranged in front of the encoder of the first stage, and a segHead is further arranged behind the decoder of the first stage; the encoder of each stage combines the spatial depth convolution and residual structure to encode the input characteristics, strengthen semantic information and obtain the output characteristics of the encoder; the output characteristics of the encoders at other stages except the final stage are transmitted to the encoder at the next stage after passing through the downsampling layer; the output characteristics of the encoder of each stage and the output characteristics of the decoder of the next stage are input into the decoder of the stage together after passing through an up-sampling layer; the output characteristics of the first stage decoder output the probability that each point in the point cloud belongs to each class through the SegHead;
s2, acquiring point cloud data of a preset scene to obtain a point set P= { P 1 ,p 2 ,…,p i ,…,p N Feature set f= { F corresponding to point set 1 ,f 2 ,…,f i ,…,f N P, where i And f i Respectively three-dimensional coordinates and characteristics of an ith point in the point cloud, wherein N is the number of points in the point cloud;
s3, inputting the point set and the feature set into a semantic segmentation model;
s4, obtaining the probability of each point in the semantic segmentation model output point cloud;
s5, selecting the classification with the maximum probability of each point as a prediction label, and obtaining a point cloud data segmentation result of the preset scene according to the prediction label.
Preferably, the encoder comprises at least one SDR block, optionally with a DFA module added to the encoder, the DFA module being located on the input side of the SDR block.
Preferably, the input features of the DFA module sequentially pass through the downsampling layer, the linear layer, the activation function, the SDC operation, the linear layer, the activation function and the upsampling layer from the input side to the output side to obtain DFA intermediate features, the input features of the DFA module are further combined with the DFA intermediate features, and the output features of the DFA module are obtained through the linear layer and the activation function.
Preferably, the sampling mode of the up-sampling layer is nearest neighbor interpolation.
Preferably, the input features of the SDR block sequentially pass through the linear layer, the SDC operation, the linear layer and the linear layer from the input side to the output side to obtain intermediate features of the SDR block, and the input features of the SDR block are added with the intermediate features of the SDR block through shortcut and then the output features of the SDR block are obtained through an activation function.
Preferably, when the dimensions of the input feature and the output feature of the SDR block are equal, the input feature of the SDR block is shortcut; when the dimensions of the input features and the output features of the SDR block are unequal, the input features of the SDR block are expanded by the linear layer to obtain shortcut.
Preferably, the normalization process or activation function may be selectively added after the linear layer or SDC operation.
Preferably, the SDC operates as a spatial depth convolution, the feature f of the ith point after the convolution i The' formula is as follows:
wherein p is i And p j Three-dimensional coordinates, Δp, of the i-th point and the j-th point, respectively i,j =p i -p j Sigma is the data Δp i,j Variance of f j Characteristic of the jth point in the point cloud, Ω i Is the ith point in the point cloudNeighbor index set, j is Ω i Is a component of the group.
Preferably, the decoder of the last stage includes a linear layer, and the decoder of the other stages includes a Concat function and a linear layer in order from the input side to the output side, the Concat function combining the output characteristics of the encoder of the same stage with the output characteristics of the decoder of the next stage, and the linear layer being followed by a selective increase of the activation function.
Preferably, the SegHead comprises three linear layers and one softmax layer in order from the input side to the output side.
Compared with the prior art, the invention has the beneficial effects that: the space depth volume and residual structure are combined to perform point cloud semantic segmentation, so that memory consumption and calculation consumption are reduced, accuracy is improved, and large-scale scene point clouds can be rapidly and effectively processed at one time.
Drawings
FIG. 1 is a semantic segmentation model of the present invention;
FIG. 2 is a sample of an input point cloud according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a DFA module according to the present invention;
FIG. 4 is a schematic diagram of an SDR block of the present invention;
fig. 5 is a graph of an input point cloud sample segmentation result according to an embodiment of the present invention.
Detailed Description
The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
It is noted that unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
As shown in fig. 1-5, a large-scale point cloud semantic segmentation method combining a spatial depth convolution and a residual structure includes the following steps:
s1, constructing a semantic segmentation model, wherein the semantic segmentation model comprises a plurality of stages, each stage sequentially comprises an encoder and a decoder from an input side to an output side, a linear layer for expanding dimension is further arranged in front of the encoder of the first stage, and a segHead is further arranged behind the decoder of the first stage; the encoder of each stage combines the spatial depth convolution and residual structure to encode the input characteristics, strengthen semantic information and obtain the output characteristics of the encoder; the output characteristics of the encoders at other stages except the final stage are transmitted to the encoder at the next stage after passing through the downsampling layer; the output characteristics of the encoder of each stage and the output characteristics of the decoder of the next stage are input into the decoder of the stage together after passing through an up-sampling layer; the output characteristics of the first stage decoder pass through the SegHead to output the probability that each point in the point cloud belongs to each class.
The semantic segmentation model SDRNet is a hierarchical structure, the network structure of the semantic segmentation model SDRNet can be divided into a plurality of stages, each stage comprises an encoder and a decoder, a linear layer for expanding dimension is further included in front of the encoder in the first stage, and the semantic segmentation model SDRNet further comprises a segHead for outputting probability that each point in the point cloud belongs to each class after the decoder. The encoder of each stage combines the spatial depth convolution and residual structure to encode the input characteristics, strengthen semantic information and obtain the output characteristics of the encoder; the output characteristics of the encoders at other stages except the final stage are transmitted to the encoder at the next stage after passing through the downsampling layer; the output characteristics of the encoder of each stage and the output characteristics of the decoder of the next stage are input into the decoder of the stage together after passing through an up-sampling layer; the output characteristics of the first stage decoder pass through the SegHead to output the probability that each point in the point cloud belongs to each class.
Further, as shown in fig. 1, a four-stage semantic segmentation model SDRNet is constructed in this example, which is a large-scale point cloud semantic segmentation model combining spatial depth convolution and residual structure, and is beneficial to the balance between speed and precision. The method comprises the steps that firstly, the input features of a semantic segmentation model are expanded to 32 through a linear layer and then are input into a first-stage encoder, the output features of the first-stage encoder are directly transmitted to a first-stage decoder, the output features of the first-stage encoder are directly transmitted to a second-stage encoder after passing through a downsampling layer, the output features of the second-stage encoder are directly transmitted to a second-stage decoder, the output features of the second-stage encoder are directly transmitted to a third-stage encoder after passing through the downsampling layer, the output features of the third-stage encoder are directly transmitted to a third-stage decoder, and the output features of the third-stage encoder are directly transmitted to a fourth-stage encoder after passing through the downsampling layer; the decoder of the fourth stage transmits the output characteristic of the fourth stage encoder to the decoder of the third stage through an up-sampling layer, the decoder of the third stage carries out first merging of the output characteristic of the third stage encoder and the output characteristic of the fourth stage decoder, the output characteristic of the first stage encoder and the output characteristic of the third stage decoder are transmitted to the decoder of the second stage through the up-sampling layer after the first merging of the Concat, the output characteristic of the second stage encoder and the output characteristic of the third stage decoder carry out second merging of the Concat, the decoder of the first stage carries out third merging of the output characteristic of the first stage encoder and the output characteristic of the second stage decoder, the output characteristic of the first stage decoder outputs the output characteristic of the first stage decoder after the third merging of the Concat, and the output characteristic of the first stage decoder carries out probability that each point belongs to each class in the SegHead output point cloud. It should be noted that, according to the actual requirement, the SDRNet may also construct a network structure at any stage.
In one embodiment, the encoder includes at least one SDR block, and optionally a DFA module is added to the encoder, the DFA module being located on the input side of the SDR block.
Wherein the encoder comprises at least one SDR block, and a DFA module can be selectively added in the encoder, and the DFA module is positioned on the input side of the SDR block. In the encoder, the input features are encoded to output new features. The output characteristics of the encoder are downsampled and transferred to the encoder of the next stage except for the last stage. In this embodiment, the encoder in the first stage of the semantic segmentation model SDRNet includes a DFA module and two SDR blocks, the encoder in the second stage includes a DFA module and three SDR blocks, the encoder in the third stage includes a DFA module and six SDR blocks, and the encoder in the fourth stage includes three SDR blocks. It should be noted that the number of DFA modules and SDR blocks in each stage may also be adjusted according to the actual situation.
In an embodiment, the input features of the DFA module sequentially pass through the downsampling layer, the linear layer, the activation function, the SDC operation, the linear layer, the activation function, and the upsampling layer from the input side to the output side to obtain DFA intermediate features, the input features of the DFA module are further combined with the DFA intermediate features, and the output features of the DFA module are obtained through the linear layer and the activation function.
The DFA module comprises a downsampling layer, a linear layer, SDC operation, an upsampling layer and an activating function, wherein the linear layer and the activating function are multiple. The DFA module can also be selectively added with normalization processing after the operation of a linear layer or SDC, thereby helping to accelerate the convergence speed of the training stage and preventing overfitting. Further, as shown in fig. 3, N is the number of points in the point cloud, L is the linear layer, BN is the normalization processing batch normal, LR is the activation function leack Relu, and I is the dimension of the input feature. The input features of the DFA module are subjected to downsampling layers to obtain N/4 points in the point cloud as first DFA intermediate features, the first DFA intermediate features are sequentially subjected to linear layer L, normalization BN and an activation function LR to reduce the dimensionality of the input features to be I/2, and second DFA intermediate features are obtained, so that calculation consumption and memory consumption in subsequent convolution are further reduced. The second DFA intermediate feature then obtains a third DFA intermediate feature through SDC operations and normalization process BN in sequence. The third DFA intermediate feature is recombined after sequentially passing through the linear layer L, the normalization BN and the activation function LR without changing the dimension, and then the fourth DFA intermediate feature is obtained. And the dimensions in the operation process of the second DFA intermediate feature, the third DFA intermediate feature and the fourth DFA intermediate feature are I/2, the fourth DFA intermediate feature is restored to the original dimension through an upsampling layer, namely the number of points in the point cloud is N, so as to obtain a fifth DFA intermediate feature, the fifth DFA intermediate feature is combined with the input feature of the DFA module (combined through a Concat function), and then the fifth DFA intermediate feature is sequentially restored to the dimension I of the input feature through a linear layer L, normalization BN and an activation function LR, so that the output feature of the DFA module is obtained. It should be noted that the activation function in the DFA module may also use the activation function Relu.
In one embodiment, the upsampling layer samples in a nearest neighbor interpolation.
The sampling mode of the up-sampling layer is nearest neighbor interpolation, and the up-sampling layer is suitable for large-scale point clouds, and is minimum in calculated amount and highest in speed.
The sampling mode of the downsampling layer is random sampling, 1/4 part of the point cloud of the current stage is randomly selected as the point cloud of the next stage, sixteen neighbors in the original point cloud are found for each point after downsampling by using a KD-Tree algorithm during downsampling, feature information is fused through the maximum pooling operation, and noise is restrained.
In an embodiment, the input features of the SDR block sequentially pass through the linear layer, the SDC operation, the linear layer and the linear layer from the input side to the output side to obtain the intermediate features of the SDR, and the input features of the SDR block are further added to the intermediate features of the SDR through the shortcut and then the output features of the SDR block are obtained through the activation function.
The SDR block is a spatial depth residual block, and mainly combines spatial depth convolution and residual structure, and comprises three linear layers and one SDC operation. Normalization or activation functions may also be optionally added to the SDR block after linear layer or SDC operation. Further, as shown in fig. 4, N is the number of points in the point cloud, I is the dimension of the input feature, O is the dimension of the output feature, L is the linear layer, BN is the normalization processing batch normal, and LR is the activation function leack Relu. The input features of the SDR block firstly sequentially pass through a linear layer L, a normalization processing BN and an activating function LR to obtain a first SDR intermediate feature, the channel number of the first SDR intermediate feature is O/4, the calculation consumption and the memory consumption of subsequent convolution operation can be further reduced, the first SDR intermediate feature sequentially passes through an SDC operation and the normalization processing BN to obtain a second SDR intermediate feature, the second SDR intermediate feature sequentially passes through the linear layer L, the normalization processing BN and the activating function LR to obtain a third SDR intermediate feature, the third intermediate feature recombines the convolved second SDR intermediate feature without changing the dimension, the channel numbers of the first SDR intermediate feature, the second SDR intermediate feature and the third SDR intermediate feature are O/4, the third SDR intermediate feature sequentially passes through the linear layer L, the normalization processing BN and the activating function LR to obtain a fourth SDR intermediate feature, the dimension of the output feature is expanded by the linear layer L, the fourth SDR intermediate feature and the shttcut is added, and the output feature of the block is obtained by the activating function LR, and the problem of disappearance can be relieved.
In an embodiment, when the dimensions of the input feature and the output feature of the SDR block are equal, the input feature of the SDR block is shortcut; when the dimensions of the input features and the output features of the SDR block are unequal, the input features of the SDR block are expanded by the linear layer to obtain shortcut.
The shortcut has two cases, and when the I and O of the SDR block are equal, the input characteristic can be directly used as the shortcut; when the I and O of the SDR block are unequal, the input feature is used as a shortcut after the O dimension feature is obtained through a linear layer L. It should be noted that the activation function of the SDR block may also use an activation function Relu, etc.
In one embodiment, the normalization process or activation function may be selectively added after the linear layer or SDC operation.
The selective addition of normalization processing after the operation of the linear layer or SDC can accelerate convergence and prevent overfitting, and nonlinear factors can be introduced through activating functions, so that the expression capacity of the model is improved.
In one embodiment, the SDC operates as a spatial depth convolution, the feature f of the ith point after the convolution i The' formula is as follows:
wherein p is i And p j Three-dimensional coordinates, Δp, of the i-th point and the j-th point, respectively i,j =p i -p j Sigma is the data Δp i,j Variance of f j Is the characteristic of the jth point in the point cloud,Ω i Is the neighbor index set of the ith point in the point cloud, j is omega i Is a component of the group.
Wherein the SDC operates as a spatial depth convolution. Assume that there is a continuous function g (.): r is R 3 →R I Parameters of the convolution kernel at any position in space can be obtained, wherein I is the dimension of the input feature. Convoluting on the three-dimensional point cloud through a function g (·), and obtaining the characteristic f of the ith point after convolution through SDC operation i The formula of':
wherein p is i And p j Three-dimensional coordinates, Δp, of the i-th point and the j-th point, respectively i,j =p i -p j Sigma is the data Δp i,j Variance of f j Characteristic of the jth point in the point cloud, Ω i Is the neighbor index set of the ith point in the point cloud, j is omega i Is a component of the group. Omega shape i The method can be obtained through a KD-Tree algorithm, which is a conventional technology well known to those skilled in the art and is not described herein. In this example g (-) is a linear layer and the Relu activation function, learned in training.
In particular, for Lei Dadian clouds, Δp is first of all i,j Rotating to obtain Δp' i,j
Wherein the three-dimensional coordinates of the ith point in the point cloud are (x i ,y i ,z i ) Theta is then i =tan -1 (y i /x i ). Thereby obtaining SDC operation for radar point cloud, and obtaining characteristic f of ith point after convolution i ' formula:
in an embodiment, the decoder of the last stage includes a linear layer, the decoder of the other stages includes a Concat function and a linear layer sequentially from the input side to the output side, the Concat function combines the output characteristics of the encoder of the same stage with the output characteristics of the decoder of the next stage, and the linear layer is followed by a selective increase of the activation function.
Wherein, as shown in fig. 1, the decoder of the last stage includes a linear layer, and the decoders of the other stages include a Concat function operation and a linear layer. Except for the last stage, the decoder inputs the output characteristics of the encoder of the same stage and the output characteristics of the decoder of the next stage, namely, the decoder of the current stage carries out up-sampling on the output characteristics of the decoder of the next stage through an up-sampling layer, and the up-sampled characteristics and the output characteristics of the encoder of the current stage are input into a Concat function together for combination. In this embodiment, the input features of the current stage decoder are passed through a linear layer to obtain the output features. The decoder of the last stage directly inputs the output characteristics of the encoder of the same stage, and then obtains the output characteristics through a linear layer. In addition, the output characteristics of the decoder are transferred to the decoder of the previous stage except for the first stage. It should be noted that, the activation functions may be selectively added to each stage, for example, the decoder of the last stage includes a linear layer and an activation function, the decoder of the other stages includes a Concat function, a linear layer and an activation function, and the input features of the decoder of the current stage sequentially pass through a linear layer and an activation function to obtain the output features. The decoder of the last stage directly inputs the output characteristics of the encoder of the same stage, and then sequentially passes through a linear layer and an activation function to obtain the output characteristics. The output characteristics of the decoders of the other stages, except the first stage, are transferred to the decoder of the previous stage.
In one embodiment, the SegHead includes three linear layers and one softmax layer in order from the input side to the output side.
The SegHead comprises three linear layers and one softmax layer from the input side to the output side. The dimensions of the three linear layers are 64, 32, 19 in sequence. The features of each point pass through three linear layers and one softmax layer in turn, outputting the probability of each class.
S2, acquiring point cloud data of a preset scene to obtain a point set P= { P 1 ,p 2 ,…,p i ,…,p N Feature set f= { F corresponding to point set 1 ,f 2 ,…,f i ,…,f N P, where i And f i The three-dimensional coordinates and the features of the ith point in the point cloud are respectively, and N is the number of points in the point cloud. The scene point cloud sample acquired by the embodiment is shown in fig. 2.
S3, inputting the point set and the feature set into the semantic segmentation model.
S4, obtaining the probability of each point in the semantic segmentation model output point cloud.
S5, selecting the classification with the maximum probability of each point as a prediction label, and obtaining a point cloud data segmentation result of the preset scene according to the prediction label.
If the trained network parameters exist, the semantic segmentation model directly adopts the trained network parameters, otherwise, the network parameters of the semantic segmentation model are trained first. The training steps are as follows:
a) And obtaining a point cloud data set with semantic labels, and dividing the point cloud into a training set and a verification set. In this embodiment, the SemanticKITTI Lei Dadian cloud data set is downloaded, the SemanticKI TTI sequences 00-07 and 09-10 are used as training sets, and the sequence 08 is used as verification set;
b) Randomly selecting point clouds of a plurality of scenes in the training set, inputting the point clouds into the semantic segmentation model SDRNet in the step S1, and outputting the probability of each point in the point clouds, wherein six scenes are selected in the embodiment;
c) Removing the unlabeled dots;
d) Carrying out one-hot coding on the label;
e) Calculating a loss function, in this example using the WCE loss function;
f) Optimizing network parameters by using an Adam optimizer;
g) Training on the training set a predetermined number of times goes to step h), otherwise go to step b). In this embodiment, a complete training round on the SemanticKITTI training set goes to step h), otherwise go to step b);
h) Evaluating model precision on the verification set, wherein the evaluation index is equal to the cross ratio mIoU, and if the model precision is not saved or is higher than the saved model precision, saving network parameters;
i) If the training reaches 100 rounds, ending the training, otherwise turning to the step b).
According to the method and the device, the semantic segmentation is carried out on the large-scale scene point cloud by constructing the multi-stage semantic segmentation model SDRNet, so that the point cloud information in the scene can be extracted efficiently, and a high-precision semantic segmentation result is obtained. Further, in the embodiment of the application, by constructing the four-stage semantic segmentation model SDRNet to carry out semantic segmentation on the large-scale scene point cloud, different categories can be segmented in the point cloud, the unmanned vehicle is helped to analyze the environment in the scene, the high precision is maintained, the calculation speed is high, and the efficiency is further improved. Further, in the unmanned vehicle application scene, the unmanned vehicle acquires point cloud data in the traveling scene through a radar (such as a velodyne64 laser radar) in the traveling process, a point set and a feature set of the point cloud are obtained, the probability of each point in the point cloud is output after the point set and the feature set are input into a semantic segmentation model SDRNet, the classification with the maximum probability of each point is selected as a prediction label, and a unmanned vehicle traveling scene point cloud segmentation result is obtained according to the prediction label. As shown in fig. 5, the semantic segmentation result diagram of the unmanned scene point cloud is that different gray scales in the diagram represent different categories, and gray scale colors from light to dark represent plants, floors, buildings, people and vehicles respectively. The plant can be further subdivided into trunks, grasslands and others, the ground can be divided into roads, sidewalks, parking spaces and others, the building can be divided into houses, fences, posts, traffic signs and others, the person can be divided into pedestrians, cyclists and motorcyclists, and the vehicle can be divided into cars, bicycles, motorcycles, trucks and others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples merely represent a more specific and detailed description of the present application and are not therefore to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (6)

1. A large-scale point cloud semantic segmentation method combining a spatial depth convolution and a residual structure is characterized by comprising the following steps of: the large-scale point cloud semantic segmentation method combining the spatial depth convolution and the residual error structure comprises the following steps of:
s1, constructing a semantic segmentation model, wherein the semantic segmentation model comprises a plurality of stages, each stage sequentially comprises an encoder and a decoder from an input side to an output side, a linear layer for expanding dimension is further arranged in front of the encoder of a first stage, and a segHead is further arranged behind the decoder of the first stage; the encoder in each stage encodes the input characteristics by combining the spatial depth convolution and the residual error structure, strengthens semantic information and obtains the output characteristics of the encoder; the output characteristics of the encoders at other stages except the final stage are transmitted to the encoder at the next stage after passing through the downsampling layer; the output characteristics of the encoder of each stage and the output characteristics of the decoder of the next stage are input into the decoder of the stage together after passing through an up-sampling layer; the output characteristics of the first stage decoder pass through the probability that each point in the SegHead output point cloud belongs to each class;
the encoder comprises at least one SDR block, wherein a DFA module can be selectively added in the encoder, and the DFA module is positioned on the input side of the SDR block;
the input features of the DFA module are sequentially subjected to a downsampling layer, a linear layer, an activation function, SDC operation, the linear layer, the activation function and an upsampling layer from the input side to the output side to obtain intermediate features of the DFA module, the input features of the DFA module are combined with the intermediate features of the DFA module, and the output features of the DFA module are obtained through the linear layer and the activation function;
the input features of the SDR block sequentially pass through a linear layer, SDC operation, a linear layer and a linear layer from an input side to an output side to obtain intermediate features of the SDR block, and the input features of the SDR block are added with the intermediate features of the SDR block through shortcut and then the output features of the SDR block are obtained through an activation function;
the decoder of the last stage comprises a linear layer, the decoders of other stages sequentially comprise a Concat function and a linear layer from an input side to an output side, the Concat function combines the output characteristics of the encoder of the same stage with the output characteristics of the decoder of the next stage, and an activation function can be selectively added after the linear layer;
s2, acquiring point cloud data of a preset scene to obtain a point set P= { P 1 ,p 2 ,…,p i ,…,p N Feature set f= { F corresponding to the point set } 1 ,f 2 ,…,f i ,…,f N P, where i And f i Respectively three-dimensional coordinates and characteristics of an ith point in the point cloud, wherein N is the number of points in the point cloud;
s3, inputting the point set and the feature set into the semantic segmentation model;
s4, obtaining the probability of each point in the semantic segmentation model output point cloud;
s5, selecting the classification with the maximum probability of each point as a prediction label, and obtaining a point cloud data segmentation result of the preset scene according to the prediction label.
2. The large scale point cloud semantic segmentation method combining spatial depth convolution and residual structure according to claim 1, wherein: the sampling mode of the up-sampling layer is nearest neighbor interpolation.
3. The large scale point cloud semantic segmentation method combining spatial depth convolution and residual structure according to claim 1, wherein: when the dimensions of the input feature and the output feature of the SDR block are equal, the input feature of the SDR block is the shortcut; and when the dimensionalities of the input features and the output features of the SDR block are unequal, the input features of the SDR block are expanded by a linear layer to obtain the shortcut.
4. A large scale point cloud semantic segmentation method combining spatial depth convolution and residual structure as claimed in claim 1 or 3, wherein: the linear layer or SDC may be operated upon to selectively add normalization processes or activation functions.
5. The large scale point cloud semantic segmentation method combining spatial depth convolution and residual structure according to claim 4, wherein: the SDC operation is spatial depth convolution, and the characteristic f 'of the ith point after convolution' i The formula is as follows:
wherein p is i And p j Three-dimensional coordinates, Δp, of the i-th point and the j-th point, respectively i,j =p i -p j Sigma is the data Δp i,j Variance of f j Characteristic of the jth point in the point cloud, Ω i Is the neighbor index set of the ith point in the point cloud, j is omega i The elements in (a) and the continuous function g (R) 3 →R I I.e. for combining three-dimensional vectors R 3 Conversion to an I-dimensional vector R I I is the dimension of the input feature of the SDC operation.
6. The large scale point cloud semantic segmentation method combining spatial depth convolution and residual structure according to claim 1, wherein: the SegHead comprises three linear layers and one softmax layer in order from the input side to the output side.
CN202011048758.4A 2020-09-29 2020-09-29 Large-scale point cloud semantic segmentation method combining spatial depth convolution and residual error structure Active CN112215231B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011048758.4A CN112215231B (en) 2020-09-29 2020-09-29 Large-scale point cloud semantic segmentation method combining spatial depth convolution and residual error structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011048758.4A CN112215231B (en) 2020-09-29 2020-09-29 Large-scale point cloud semantic segmentation method combining spatial depth convolution and residual error structure

Publications (2)

Publication Number Publication Date
CN112215231A CN112215231A (en) 2021-01-12
CN112215231B true CN112215231B (en) 2024-03-08

Family

ID=74051996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011048758.4A Active CN112215231B (en) 2020-09-29 2020-09-29 Large-scale point cloud semantic segmentation method combining spatial depth convolution and residual error structure

Country Status (1)

Country Link
CN (1) CN112215231B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112819833B (en) * 2021-02-05 2022-07-12 四川大学 Large scene point cloud semantic segmentation method
CN112990010B (en) * 2021-03-15 2023-08-18 深圳大学 Point cloud data processing method and device, computer equipment and storage medium
US11875424B2 (en) 2021-03-15 2024-01-16 Shenzhen University Point cloud data processing method and device, computer device, and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109410307A (en) * 2018-10-16 2019-03-01 大连理工大学 A kind of scene point cloud semantic segmentation method
CN110570429A (en) * 2019-08-30 2019-12-13 华南理工大学 Lightweight real-time semantic segmentation method based on three-dimensional point cloud

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11004202B2 (en) * 2017-10-09 2021-05-11 The Board Of Trustees Of The Leland Stanford Junior University Systems and methods for semantic segmentation of 3D point clouds

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109410307A (en) * 2018-10-16 2019-03-01 大连理工大学 A kind of scene point cloud semantic segmentation method
CN110570429A (en) * 2019-08-30 2019-12-13 华南理工大学 Lightweight real-time semantic segmentation method based on three-dimensional point cloud

Also Published As

Publication number Publication date
CN112215231A (en) 2021-01-12

Similar Documents

Publication Publication Date Title
CN112215231B (en) Large-scale point cloud semantic segmentation method combining spatial depth convolution and residual error structure
CN109697852B (en) Urban road congestion degree prediction method based on time sequence traffic events
CN111275711B (en) Real-time image semantic segmentation method based on lightweight convolutional neural network model
CN111259905B (en) Feature fusion remote sensing image semantic segmentation method based on downsampling
CN110781776B (en) Road extraction method based on prediction and residual refinement network
CN113469094A (en) Multi-mode remote sensing data depth fusion-based earth surface coverage classification method
CN112541503A (en) Real-time semantic segmentation method based on context attention mechanism and information fusion
CN110781893B (en) Feature map processing method, image processing method, device and storage medium
CN113421269A (en) Real-time semantic segmentation method based on double-branch deep convolutional neural network
CN112084868A (en) Target counting method in remote sensing image based on attention mechanism
CN113516133B (en) Multi-modal image classification method and system
CN113240683B (en) Attention mechanism-based lightweight semantic segmentation model construction method
CN113870335A (en) Monocular depth estimation method based on multi-scale feature fusion
CN112990065A (en) Optimized YOLOv5 model-based vehicle classification detection method
CN115035131A (en) Unmanned aerial vehicle remote sensing image segmentation method and system of U-shaped self-adaptive EST
CN114037640A (en) Image generation method and device
CN116797787B (en) Remote sensing image semantic segmentation method based on cross-modal fusion and graph neural network
CN114005085A (en) Dense crowd distribution detection and counting method in video
CN114299286A (en) Road scene semantic segmentation method based on category grouping in abnormal weather
CN114359902B (en) Three-dimensional point cloud semantic segmentation method based on multi-scale feature fusion
CN116612283A (en) Image semantic segmentation method based on large convolution kernel backbone network
CN117237559A (en) Digital twin city-oriented three-dimensional model data intelligent analysis method and system
CN115457509A (en) Traffic sign image segmentation algorithm based on improved space-time image convolution
CN115965789A (en) Scene perception attention-based remote sensing image semantic segmentation method
He et al. Classification of metro facilities with deep neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant