CN114743007A - Three-dimensional semantic segmentation method based on channel attention and multi-scale fusion - Google Patents

Three-dimensional semantic segmentation method based on channel attention and multi-scale fusion Download PDF

Info

Publication number
CN114743007A
CN114743007A CN202210418602.3A CN202210418602A CN114743007A CN 114743007 A CN114743007 A CN 114743007A CN 202210418602 A CN202210418602 A CN 202210418602A CN 114743007 A CN114743007 A CN 114743007A
Authority
CN
China
Prior art keywords
convolution
point cloud
layer
channel
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210418602.3A
Other languages
Chinese (zh)
Inventor
张莹
孙月
张露露
王玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiangtan University
Original Assignee
Xiangtan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiangtan University filed Critical Xiangtan University
Priority to CN202210418602.3A priority Critical patent/CN114743007A/en
Publication of CN114743007A publication Critical patent/CN114743007A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of three-dimensional point cloud data processing, and discloses a three-dimensional point cloud semantic segmentation method based on channel attention and multi-scale fusion. Firstly, point cloud data to be segmented are read, preprocessed and input into a segmentation network. Then sequentially passes through four modules consisting of an encoder and a channel attention layer, wherein the encoder comprises a downsampling layer, a grouping layer and a position adaptive convolution. And extracting the point cloud context information by using a multi-scale convolution context module, and finally sequentially passing through four decoders consisting of an upper sampling layer and a unit PointNet network. The final segmentation result is obtained by a fully connected layer of size k (number of classes). The invention not only makes full use of the position information of the point cloud, but also introduces a channel attention layer to recalibrate the point cloud characteristics on the channel dimension, pays more attention to the channel information useful for the segmentation task, further provides a multi-scale convolution context module, and adopts the cavity convolution with the same expansion rate but different kernel sizes to capture the characteristics of different scales in parallel, thereby improving the segmentation result.

Description

Three-dimensional semantic segmentation method based on channel attention and multi-scale fusion
Technical Field
The invention belongs to the technical field of three-dimensional point cloud data processing, and particularly relates to a three-dimensional point cloud semantic segmentation method based on channel attention and multi-scale fusion.
Background
With the development and the rise of artificial intelligence technology, 3D point cloud data analysis draws extensive attention, and compared with a two-dimensional image, the 3D point cloud contains richer three-dimensional space information, is not influenced by external factors such as illumination and visual angle, and can accurately and comprehensively depict a model. The 3D point cloud segmentation is one of artificial intelligence forward-edge research directions as key contents for scene understanding, and is widely applied to the fields of robots, virtual reality, automatic driving, laser remote sensing measurement and the like.
Point cloud segmentation methods can be divided into traditional point cloud segmentation and point cloud semantic segmentation. The traditional point cloud segmentation utilizes information such as the position and the shape of the point cloud to segment different region boundaries, mainly comprises edge-based segmentation methods, region-based segmentation methods and model fitting-based segmentation methods, and segmentation results obtained by the methods do not contain any semantic information, the results need to be semantically labeled manually, and the efficiency is extremely low under the condition of large data scale. Point cloud semantic segmentation is to automatically label semantic labels of different types to objects of different types in a three-dimensional space on the basis of traditional point cloud segmentation, so that each object has specific type information, and currently, deep learning is mainly used as an implementation means, and the processing modes mainly include the following three types:
(1) the voxel-based method comprises the steps of dividing a three-dimensional scene into voxel grids, converting original three-dimensional point clouds into voxels, and then processing by adopting a three-dimensional convolution network. However, the three-dimensional points are mainly concentrated on the surface of the object and become very sparse after being converted into voxels, so that the time and space utilization rate of the dense convolutional network is low, and a part of information is lost in the process of voxel conversion, thereby affecting the performance of the network.
(2) The method based on multi-view projection comprises the steps of firstly projecting a three-dimensional object into a plurality of views, then extracting image features by using a conventional two-dimensional convolution neural network, and identifying and analyzing a target. Due to the problem of object occlusion in a real scene, a part of information loss is caused after an object is projected on a two-dimensional plane, spatial structure information contained in three-dimensional data cannot be fully utilized, and the selection of a projection plane also has a certain influence on the result of an algorithm.
(3) The method based on the original point cloud directly processes the three-dimensional point cloud data in the scene without the aid of intermediate data types. Compared with the former two methods, the method needs less memory and does not generate information loss, and the characteristics are extracted mainly by adopting a multilayer perceptron or a convolution method suitable for point cloud data.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a three-dimensional point cloud semantic segmentation method based on channel attention and multi-scale fusion, so that the segmentation precision is improved.
The invention discloses a three-dimensional point cloud semantic segmentation method based on channel attention and multi-scale fusion, which comprises the following steps of:
step 1, reading and preprocessing point cloud data;
step 2, the point cloud data passes through an encoder consisting of a down-sampling layer, a grouping layer and position self-adaptive convolution and is mainly responsible for up-sampling and feature extraction;
and 2.1, downsampling the point cloud by using a downsampling layer.
And 2.2, dividing the point set obtained in the last step into a plurality of areas by using a grouping layer.
And 2.3, extracting initial features of each region by using a position self-adaptive convolution method.
Step 3, recalibrating the point cloud characteristics by utilizing a channel attention layer, modeling the correlation among channel characteristic information, and changing the corresponding specific gravity of the different characteristics in the overall characteristic expression by learning the weight values of the different characteristics;
and 4, repeating the steps 2 to 3 for 4 times, and performing down-sampling layer by layer to extract point cloud characteristics.
And 5, inputting the feature vector output by the last channel attention layer into a multi-scale convolution context module, wherein the module samples the features in parallel by adopting the cavity convolution with the same expansion rate and different convolution kernels, so that the receptive field range is gradually enlarged, and the lost detail information is made up.
And 6, passing the feature vector output by the multi-scale convolution context module through a decoder consisting of an up-sampling layer and a PointNet unit, mainly taking charge of down-sampling and feature decoding, and taking the input of the encoder as the other input of the decoder through jumping connection.
And 6.1, upsampling the point cloud characteristics by using an upsampling layer.
And 6.2, decoding the characteristics by using a unit PointNet network.
And 7, repeating the step 6 for 4 times, and up-sampling and decoding the point cloud characteristics layer by layer.
And 8, obtaining classification scores of k classes through a full connection layer with the size of k (class number), and further obtaining a segmentation result.
Compared with the prior art, the invention has the following advantages:
(1) the invention adopts position self-adaptive convolution instead of a common multilayer perceptron to extract point cloud characteristics, constructs a convolution kernel in a dynamic data driving mode, fully utilizes the position information of points and flexibly applies the irregular geometric structure of the 3D point cloud.
(2) The invention introduces the channel attention layer, can fully apply the channel information of the characteristics, increases the weight value of the information with large contribution to the network model, and otherwise reduces the weight (reduces the corresponding weight value of the characteristics with smaller information quantity) to realize the recalibration of the characteristics by the model.
(3) The invention provides a multi-scale convolution context module which is used for extracting point cloud context information. It uses the cavity convolution with the same expansion rate but different kernel sizes to capture the features of different scales in parallel and improve the segmentation result.
Drawings
FIG. 1 is a schematic diagram of a three-dimensional point cloud semantic segmentation network structure according to the present invention.
Fig. 2 is a schematic diagram of the encoder and decoder of fig. 1 according to the present invention.
FIG. 3 is a schematic diagram of the structure of the position adaptive convolution shown in FIG. 2 according to the present invention.
FIG. 4 is a flow chart of the channel attention layer of FIG. 1 according to the present invention.
FIG. 5 is a diagram illustrating a comparison between hole convolution and standard convolution in the multi-scale convolution context module according to the present invention.
FIG. 6 is a block diagram illustrating the structure of the context module of FIG. 1 according to the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
Aiming at the problems of the original point cloud segmentation method based on deep learning, the invention provides a three-dimensional point cloud semantic segmentation method based on channel attention and multi-scale fusion, and a network structure is shown in figure 1. Similar to the image segmentation method, the attention layer focuses on channel information which is more beneficial to the task, the multi-scale fusion module further samples the features by using cavity convolutions with different receptive field sizes, emphasizes ignored local information, and meanwhile, extracts preliminary features by adopting position adaptive convolution which is more suitable for point cloud data, and the specific implementation process is as follows:
step 1, reading and preprocessing point cloud data;
the existing point cloud data set is mainly divided into an indoor scene and an outdoor scene, wherein the indoor data set comprises S3DIS, ScanNet, Semantics and the like, and the storage formats mainly comprise TXT, PLY, OBJ and BIN, so that the point cloud format is unified and data is read at first, and then the point cloud data is simplified on the basis of keeping geometric characteristics through preprocessing operations such as rotation, denoising and the like, and a stable data basis is provided for subsequent processing.
And 2, passing the point cloud data through an encoder which is composed of a down-sampling layer, a grouping layer and position adaptive convolution as shown on the left side of the figure 2.
And 2.1, downsampling the point cloud by using a downsampling layer.
Given an input point { x1,x2,...,xnUsing a farthest point sampling method (FPS) to select a subset of m center points
Figure BDA0003605880560000031
So that
Figure BDA0003605880560000032
In the collection
Figure BDA0003605880560000033
The farthest point relative to the other points. Compared to random sampling, when a given number of centroids is the same, it can better cover the entire set of points and generate the receptive field in a data-dependent manner.
And 2.2, dividing the point set obtained in the last step into a plurality of areas by using a grouping layer.
The inputs to this layer are a set of points of size N x (d + C) and a set of points of size N0Centroid coordinates of x d. The output is of size N0Set of points of x K x (d + C), where each set corresponds to a local area, K being the number of points near the centroid point. The grouping method adopts a Boolean query method to select K points in a given radius range, wherein the query distance is a measurement distance, and the K values of different local areas are different. Compared with K nearest neighbor (kNN) search, the query method ensures fixed region scale, so that local region features are more universal in space.
And 2.3, extracting initial features for each region by using a position adaptive convolution method (PAConv).
As shown in FIG. 3, PAConv first defines a Weight library (Weight Bank) composed of Weight matrices, then the scoring network (Scorenet) combines the Weight matrices according to the point location learning coefficient vector, and finally the dynamic kernel is generated by combining the Weight matrices and their associated location adaptive coefficients. And (4) acting the obtained convolution kernel on the input features, and obtaining output features through maximum pooling. The detailed process is as follows:
weight bank B ═ BmI M1, M is generated by random initialization, each of which is one of
Figure BDA0003605880560000034
Representing a weight matrix and M representing the number of matrices. ScoreNet is responsible for associating the relative positions of the points with the weight matrix. Given a center point piAdjacent thereto point pjIn the positional relationship of (p)i,pj)∈RDinScorenet predicts B according to equation (1)mPosition adaptive coefficient of
Figure BDA0003605880560000035
Sij=α(θ(pi,pj)) (1)
θ in equation (1) denotes a multilayer perceptron (MLP), and α is a normalization operation implemented using a softmax function. Output vector
Figure BDA0003605880560000036
Wherein
Figure BDA0003605880560000037
Represents the construction of core K (p)i,pj) At time BmM is the number of weight matrices. The value range of the softmax function is guaranteed to be between 0 and 1, each weight matrix is guaranteed to be selected with a certain probability value, and the larger the value is, the stronger the relation between the position input and the weight matrix is. The kernel of PAConv is derived from equation (2) by combining the weight matrix in the weight library with the location adaptive coefficients predicted by ScoreNet.
Figure BDA0003605880560000038
And finally, acting the generated kernel on the input features according to a formula (3), and obtaining a new feature vector through maximum pooling.
Figure BDA0003605880560000041
K in formula (3) represents a convolution kernel,
Figure BDA0003605880560000042
denotes maximum pooling operation, PinAnd PoutRespectively representing input and output characteristics.
And 3, re-calibrating the point cloud characteristics by utilizing a channel attention layer (L _ SE layer).
The L _ SE layer is composed of three parts of Squeeze, Excitation and weight. The Squeeze performs feature compression on the spatial dimension, each feature channel is changed into a real number, the real number has a global receptive field to some extent, and the output dimension is the same as the number of input feature channels. The Excitation generates a weight on each feature channel based on the correlation between the feature channels, and the weight is used for representing the importance degree of the feature channel. Reweigh considers the weight of the output of the specification as the importance of each feature channel, and then weights the feature channel by channel to the previous feature by multiplication, thereby completing the recalibration of the original feature in the channel dimension. The detailed process is as follows:
for point cloud data, Squeeze is implemented by one-dimensional global average pooling, as shown in formula (4), and correlation statistics of information among feature mapping channels is completed:
Pavg=AvgPool1D(Pin) (4)
based on the information obtained by the Squeeze operation, in order to further capture the correlation information between channels, an operation is performed by means of a sigmoid activation function, as shown in equation (4).
Ps=σ(L(δ(L(Pavg)))) (5)
In the formula (5), σ represents a sigmoid function, L represents a Linear function, and δ represents a leak _ ReLU activation function. In the back propagation process, different from the ReLU function of the original network, the Leaky _ ReLU activation function selected by the invention can also calculate the gradient at the part with the input less than zero, and can solve the problem of neuron death as shown in formulas (6) and (7) if the sample value of the ReLU is 0.
ReLU(x)=max(0,x) (6)
Leaky_ReLU=max(0,αx) (7)
In order to reduce the complexity of a network model and improve the adaptability of the network to different data, a first Linear function reduces the dimension of an input channel into
Figure BDA0003605880560000043
And then, expanding the dimension of the data by using a Leaky _ ReLU activation function and then using a Linear function to ensure that the dimension of the data is the same as the original input dimension. And finally, inputting the weighted value into a sigmoid function to normalize the weighted value into a numerical value from 0 to 1, and then weighting the weighted value to the original channel information through a formula (8) to complete recalibration.
Figure BDA0003605880560000044
P in formula (8)outThe calculation process for the new features output by the L _ SE layer is shown in FIG. 4.
And 4, repeating the steps 2 to 3 for 4 times, and sampling layer by layer to extract point cloud characteristics.
And 5, extracting detail information by using a multi-scale convolution context (MSCC) module.
MSCC is designed to extract rich point cloud features, unlike standard convolution, the present invention selects one-dimensional hole convolution. The hole convolution is actually a process of sampling a point cloud feature, and a sampling frequency is set according to a parameter hole size (rate). When rate is 1, the characteristic sampling does not lose any information, namely standard convolution operation; when rate >1, samples are taken every (rate-1) point cloud on the raw data, increasing the extent of the receptive field. The actual kernel size K is calculated according to equation (9).
kernel_size+(kernel_size-1)(rate-1) (9)
The kernel _ size in equation (9) is the initial kernel size. So when the standard convolution is selected, K is equal to kernel _ size, while the K for the hole convolution is larger, as shown in fig. 5 for comparison.
The cavity convolution can not reduce the space dimension and increase the parameter quantity while increasing the receptive field, thereby realizing the balance of precision and speed. The size of the convolved output point cloud is calculated according to the formula (10):
·input:(B,cin,Nin)
·output:(B,Cout,Nout)
Figure BDA0003605880560000051
in the formula (10), N is the number of point clouds, and resolution represents rate. For different convolution kernel sizes, in order to keep the output N unchanged, partition is set to 2 and padding is equal to (kernel _ size-1).
The structure of MSCC is shown in fig. 6, where global information is obtained by first using a standard convolution with a kernel size of 1, and then parallel sampling is performed by using a hole convolution with an expansion rate of 2 and kernel sizes of 3, 5, and 7, respectively. Therefore, context features are extracted by using different receptive fields, and the relation between adjacent point clouds is strengthened.
And 6, passing the feature vector output by the multi-scale convolution context module through a decoder consisting of an upsampling layer and a PointNet unit as shown in the right side of the figure 2, and taking the output of the encoder as the other input of the decoder through jumping connection.
And 6.1, upsampling the point cloud characteristics by using an upsampling layer.
And (4) performing up-sampling by adopting an interpolation method, and recovering the original point cloud scale again. Based on the coordinates of the center point, interpolation is performed using a K nearest neighbor algorithm with K ═ 3, as shown in equation (11):
Figure BDA0003605880560000052
and 6.2, decoding the characteristics by using a PointNet network unit.
The PointNet network mainly comprises a conversion network (T-Net) and a multilayer perceptron (MLP). T-Net is used to generate a transformation matrix and apply this transformation directly to the coordinates of the input points, specifically using two-dimensional regularization, and in order to keep the point cloud rotation invariant, as much as possible using orthogonal matrices, as shown in equation (12). The T-Net network is used for aligning the features, so that the features are more beneficial to extraction.
Preg=||I-AAT||2 (12)
P in the formula (10)regAnd I is an identity matrix corresponding to the dimension of the input matrix, and A is a feature matrix needing to be converted.
MLP is a neural network model composed of an input layer, a hidden layer and an output layer, with the output hw,b(x) Where w represents the inter-layer weight matrix and b represents the offset. The number of the unit PointNet networks is 3 or 4, and the dimensionality reduction is carried out on the feature vectors in sequence.
And 7, repeating the step 6 for 7 times, and up-sampling the decoded point cloud characteristics layer by layer.
And 8, obtaining classification scores of k classes through a full connection layer with the size of k (class number), and further obtaining a segmentation result.
Examples
The data set used in this embodiment is the S3DIS data set, which is collected from the indoor environment of three different buildings, containing 271 rooms of 6 zones. The total number of the point clouds is 695,878,620, each point cloud has corresponding coordinate and color information, semantic labels such as chairs, tables, floors and walls, and the total number of the semantic labels is 13. This example selects zones 1, 2, 3, 4 and 6 for training and zone 5 for testing. During training, the present embodiment samples the input points into a uniform number of 4096 points, while all points are used during testing.
In this embodiment, 150 epochs are trained on two GeForce RTX 2080Ti GPUs, the batch size is 16, an SGD optimizer with an initial learning rate of 0.05 is used, the momentum is 0.9, and the weight attenuation rate is 10-4And is realized on a Pythrch platform by using Linux. After the model is obtained by training the network using the training set, the model performance is evaluated by the test set, and mlou (mean cross-over ratio) is selected as an evaluation index. At S3DIS numberIoU (cross-over ratio) of each category on the data set is shown in table 1, and mIoU is 64.8, so that the method can realize better segmentation performance in a three-dimensional point cloud semantic segmentation task.
Table 1: IoU results for each category on S3DIS dataset
Figure BDA0003605880560000061
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (4)

1. A three-dimensional point cloud semantic segmentation method based on channel attention and multi-scale fusion is characterized by comprising the following steps:
step 1, reading and preprocessing point cloud data;
step 2, the point cloud data passes through an encoder composed of a down-sampling layer, a grouping layer and position self-adaptive convolution and is mainly responsible for up-sampling and feature extraction;
step 3, recalibrating the point cloud characteristics by utilizing a channel attention layer, modeling the correlation among channel characteristic information, and changing the corresponding specific gravity of the different characteristics in the overall characteristic expression by learning the weight values of the different characteristics;
and 4, repeating the steps 2 to 3 for 4 times, and performing down-sampling layer by layer to extract point cloud characteristics.
And 5, inputting the feature vector output by the last channel attention layer into a multi-scale convolution context module, wherein the module samples the features in parallel by adopting the cavity convolution with the same expansion rate and different convolution kernels, so that the receptive field range is gradually enlarged, and the lost detail information is made up.
And 6, passing the feature vector output by the multi-scale convolution context module through a decoder consisting of an up-sampling layer and a PointNet unit, mainly taking charge of down-sampling and feature decoding, and taking the input of the encoder as the other input of the decoder through jumping connection.
And 7, repeating the step 6 for 4 times, and up-sampling and decoding the point cloud characteristics layer by layer.
And 8, obtaining classification scores of k classes through a full connection layer with the size of k (class number), and further obtaining a segmentation result.
2. The method of claim 1 wherein in step 2, said position adaptive convolution first defines a weight library consisting of weight matrices, then the scoring network (ScoreNet) combines the weight matrices based on the point position learning coefficient vectors, and finally the dynamic kernel is generated by combining the weight matrices and their associated position adaptive coefficients. And (4) acting the obtained convolution kernel on the input features and then obtaining the output features through maximum pooling. The detailed process is as follows:
weight library B ═ Bm1, …, M, each generated by random initialization
Figure FDA0003605880550000011
Representing a weight matrix and M representing the number of matrices. ScoreNet is responsible for associating the relative positions of the points with the weight matrix. Given a center point piAdjacent thereto point pjIn a positional relationship of (p)i,pj)∈RDinScorenet prediction BmPosition adaptive coefficient of
Figure FDA0003605880550000012
Comprises the following steps:
Sij=α(θ(pi,pj))
where θ denotes the multilayer perceptron (MLP) and α is the normalization operation that implements the softmax function. Output vector
Figure FDA0003605880550000013
Wherein
Figure FDA0003605880550000014
Represents the construction of the nucleus K (p)i,pj) When B ismM is the number of weight matrices. The value range of the softmax function is guaranteed to be between 0 and 1, each weight matrix is guaranteed to be selected with a certain probability value, and the larger the value is, the stronger the relation between the position input and the weight matrix is. The kernel of PAConv is derived by combining the weight matrix in the weight library with the location adaptive coefficients predicted by ScoreNet:
Figure FDA0003605880550000015
and (3) applying the generated kernel to the input features, and obtaining a new feature vector through maximum pooling:
Figure FDA0003605880550000021
wherein K represents a convolution kernel,
Figure FDA0003605880550000022
denotes maximum pooling operation, PinAnd PoutRespectively representing input and output characteristics.
3. The method of claim 1, wherein in step 3, the channel attention layer is composed of three parts, Squeeze, Excitation and weight. The Squeeze performs feature compression on the spatial dimension, each feature channel is changed into a real number, the real number has a global receptive field to some extent, and the output dimension is the same as the number of input feature channels. The Excitation generates a weight on each feature channel based on the correlation between the feature channels, and the weight is used for representing the importance degree of the feature channel. Reweigh considers the weight of the output of the specification as the importance of each feature channel, and then weights the feature channel by channel to the previous feature by multiplication, thereby completing the recalibration of the original feature in the channel dimension. The detailed process is as follows:
for point cloud data, the Squeeze is realized by one-dimensional global average pooling, and the correlation statistics of information among feature mapping channels is completed:
Pavg=AvgPool1D(Pin)
on the basis of the information obtained by the Squeeze operation, in order to further capture the inter-channel correlation information, an operation is performed by means of a sigmoid activation function:
Ps=σ(L(δ(L(Pavg))))
where σ denotes a sigmoid function, L denotes a Linear function, and δ denotes a leak _ ReLU activation function. In the back propagation process, different from the ReLU function of the original network, the Leaky _ ReLU function selected by the invention can also calculate the gradient at the part with the input less than zero, and can solve the problem of neuron death as the ReLU sample value is 0:
ReLU(x)=max(0,x)
Leaky_ReLU=max(0,αx)
in order to reduce the complexity of a network model and improve the adaptability of the network to different data, a first Linear function reduces the dimension of an input channel into
Figure FDA0003605880550000023
And then, expanding the dimension of the data by using a Leaky _ ReLU activation function and then using a Linear function to ensure that the dimension of the data is the same as the original input dimension. Finally, inputting the weighted value into a sigmoid function to normalize the weighted value into a numerical value from 0 to 1, and weighting the weighted value to the original channel information to finish recalibration:
Figure FDA0003605880550000024
wherein P isoutA new feature output for the L _ SE layer.
4. The method of claim 1, wherein in step 5, the multi-scale convolution context module is used to extract rich point cloud features, and unlike standard convolution, the invention selects one-dimensional hole convolution. The hole convolution is actually a process of sampling a point cloud feature, and a sampling frequency is set according to a parameter hole size (rate). When rate is 1, the characteristic sampling does not lose any information, namely standard convolution operation; when rate >1, samples are taken every (rate-1) point cloud on the raw data, increasing the extent of the receptive field. The actual kernel size K is calculated according to the following formula:
kernel_size+(kernel_size-1)(rate-1)
where kernel _ size is the initial kernel size. So when the standard convolution is selected, K is equal to kernel _ size, while K for the hole convolution is larger.
The cavity convolution can not reduce the space dimension and increase the parameter quantity while increasing the receptive field, thereby realizing the balance of precision and speed. The output point cloud size after convolution is:
·input:(B,Cin,Nin)
·output:(B,Cout,Nout)
Figure FDA0003605880550000031
wherein N is the point cloud number, and the variance represents the rate. For different convolution kernel sizes, in order to keep the output N unchanged, partition is set to 2 and padding is equal to (kernel _ size-1). Based on the setting, the multi-scale convolution context module firstly uses the standard convolution with the kernel size of 1 to obtain global information, and then uses the hole convolution with the expansion rate of 2 and the kernel sizes of 3, 5 and 7 respectively to perform parallel sampling. Therefore, context features are extracted by using different receptive fields, and the relation between adjacent point clouds is strengthened.
CN202210418602.3A 2022-04-20 2022-04-20 Three-dimensional semantic segmentation method based on channel attention and multi-scale fusion Pending CN114743007A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210418602.3A CN114743007A (en) 2022-04-20 2022-04-20 Three-dimensional semantic segmentation method based on channel attention and multi-scale fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210418602.3A CN114743007A (en) 2022-04-20 2022-04-20 Three-dimensional semantic segmentation method based on channel attention and multi-scale fusion

Publications (1)

Publication Number Publication Date
CN114743007A true CN114743007A (en) 2022-07-12

Family

ID=82283487

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210418602.3A Pending CN114743007A (en) 2022-04-20 2022-04-20 Three-dimensional semantic segmentation method based on channel attention and multi-scale fusion

Country Status (1)

Country Link
CN (1) CN114743007A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115588090A (en) * 2022-10-19 2023-01-10 南京航空航天大学深圳研究院 Aircraft point cloud semantic extraction method with spatial relationship and characteristic information decoupled
CN115862013A (en) * 2023-02-09 2023-03-28 南方电网数字电网研究院有限公司 Attention mechanism-based power transmission and distribution scene point cloud semantic segmentation model training method
CN116958553A (en) * 2023-07-27 2023-10-27 石河子大学 Lightweight plant point cloud segmentation method based on non-parametric attention and point-level convolution
CN117058380A (en) * 2023-08-15 2023-11-14 北京学图灵教育科技有限公司 Multi-scale lightweight three-dimensional point cloud segmentation method and device based on self-attention
CN117132501A (en) * 2023-09-14 2023-11-28 武汉纺织大学 Human body point cloud cavity repairing method and system based on depth camera

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115588090A (en) * 2022-10-19 2023-01-10 南京航空航天大学深圳研究院 Aircraft point cloud semantic extraction method with spatial relationship and characteristic information decoupled
CN115588090B (en) * 2022-10-19 2023-09-19 南京航空航天大学深圳研究院 Aircraft point cloud semantic extraction method for decoupling spatial relationship and characteristic information
CN115862013A (en) * 2023-02-09 2023-03-28 南方电网数字电网研究院有限公司 Attention mechanism-based power transmission and distribution scene point cloud semantic segmentation model training method
CN115862013B (en) * 2023-02-09 2023-06-27 南方电网数字电网研究院有限公司 Training method for power transmission and distribution electric field scenic spot cloud semantic segmentation model based on attention mechanism
CN116958553A (en) * 2023-07-27 2023-10-27 石河子大学 Lightweight plant point cloud segmentation method based on non-parametric attention and point-level convolution
CN116958553B (en) * 2023-07-27 2024-04-16 石河子大学 Lightweight plant point cloud segmentation method based on non-parametric attention and point-level convolution
CN117058380A (en) * 2023-08-15 2023-11-14 北京学图灵教育科技有限公司 Multi-scale lightweight three-dimensional point cloud segmentation method and device based on self-attention
CN117058380B (en) * 2023-08-15 2024-03-26 北京学图灵教育科技有限公司 Multi-scale lightweight three-dimensional point cloud segmentation method and device based on self-attention
CN117132501A (en) * 2023-09-14 2023-11-28 武汉纺织大学 Human body point cloud cavity repairing method and system based on depth camera
CN117132501B (en) * 2023-09-14 2024-02-23 武汉纺织大学 Human body point cloud cavity repairing method and system based on depth camera

Similar Documents

Publication Publication Date Title
CN114743007A (en) Three-dimensional semantic segmentation method based on channel attention and multi-scale fusion
Zou et al. Manhattan Room Layout Reconstruction from a Single 360^ ∘ 360∘ Image: A Comparative Study of State-of-the-Art Methods
CN111489358A (en) Three-dimensional point cloud semantic segmentation method based on deep learning
CN111368769B (en) Ship multi-target detection method based on improved anchor point frame generation model
CN107871106A (en) Face detection method and device
CN114841257B (en) Small sample target detection method based on self-supervision comparison constraint
CN111625667A (en) Three-dimensional model cross-domain retrieval method and system based on complex background image
CN111382300B (en) Multi-view three-dimensional model retrieval method and system based on pairing depth feature learning
CN110610210B (en) Multi-target detection method
CN106844620B (en) View-based feature matching three-dimensional model retrieval method
CN112329871B (en) Pulmonary nodule detection method based on self-correction convolution and channel attention mechanism
CN112784782B (en) Three-dimensional object identification method based on multi-view double-attention network
CN111860587A (en) Method for detecting small target of picture
Zhao et al. Character‐object interaction retrieval using the interaction bisector surface
CN111524140B (en) Medical image semantic segmentation method based on CNN and random forest method
CN116824585A (en) Aviation laser point cloud semantic segmentation method and device based on multistage context feature fusion network
Zhang et al. Joint information fusion and multi-scale network model for pedestrian detection
CN115311502A (en) Remote sensing image small sample scene classification method based on multi-scale double-flow architecture
CN114821341A (en) Remote sensing small target detection method based on double attention of FPN and PAN network
Fan et al. A novel sonar target detection and classification algorithm
CN117710760A (en) Method for detecting chest X-ray focus by using residual noted neural network
CN117237643A (en) Point cloud semantic segmentation method and system
CN113128564A (en) Typical target detection method and system based on deep learning under complex background
CN111597367A (en) Three-dimensional model retrieval method based on view and Hash algorithm
CN116524495A (en) Traditional Chinese medicine microscopic identification method and system based on multidimensional channel attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination