CN116310350B - Urban scene semantic segmentation method based on graph convolution and semi-supervised learning network - Google Patents
Urban scene semantic segmentation method based on graph convolution and semi-supervised learning network Download PDFInfo
- Publication number
- CN116310350B CN116310350B CN202310596881.7A CN202310596881A CN116310350B CN 116310350 B CN116310350 B CN 116310350B CN 202310596881 A CN202310596881 A CN 202310596881A CN 116310350 B CN116310350 B CN 116310350B
- Authority
- CN
- China
- Prior art keywords
- point
- points
- network
- category
- steps
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 230000011218 segmentation Effects 0.000 title claims abstract description 33
- 239000013598 vector Substances 0.000 claims abstract description 53
- 238000012549 training Methods 0.000 claims abstract description 35
- 230000006870 function Effects 0.000 claims abstract description 32
- 238000005096 rolling process Methods 0.000 claims abstract description 32
- 238000013507 mapping Methods 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 8
- 101100001674 Emericella variicolor andI gene Proteins 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 5
- 101100379079 Emericella variicolor andA gene Proteins 0.000 claims description 3
- 239000003086 colorant Substances 0.000 claims description 2
- 238000002372 labelling Methods 0.000 description 8
- 238000010276 construction Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 241000283070 Equus zebra Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a city scene semantic segmentation method based on a graph convolution and a semi-supervised learning network, which comprises the following steps: s1, a network is rolled through a pre-training chart to obtain initialization parameters; s2, inputting an original point set onceOutputting the feature vectorThe method comprises the steps of carrying out a first treatment on the surface of the S3, for the original point setComputing feature vectors from the neighborhood of each pointThe method comprises the steps of carrying out a first treatment on the surface of the S4, calculating a feature vectorAndas a loss function to adjust parameters of the graph rolling network; s5, using the labeled data to distribute pseudo labels for the unlabeled data; s6, distributing the pseudo tagSemantic segmentation is performed and the class of each point is predicted. The method can realize the semantic segmentation of the urban road scene only by a small amount of marked data.
Description
Technical Field
The invention relates to the field of computer graphics, in particular to a city scene semantic segmentation method based on graph convolution and a semi-supervised learning network.
Background
The point cloud is used as unstructured three-dimensional data, compared with data formats such as voxels and grids, the point cloud characterizes objects more accurately, flexibly and variously, and has wide application in the field of smart cities. For example, in planning urban construction, a digital traffic map is generated by using point clouds, and planning of traffic lines, urban construction and the like is assisted, so that planning efficiency and precision are improved; in environment monitoring and analysis, the point cloud data can be utilized to carry out three-dimensional modeling on an actual scene, so that the three-dimensional modeling method is used for analyzing conditions such as landforms, hydrogeology, building damage and the like, and is convenient for city management and maintenance.
In practical smart city applications, the point cloud can be generally divided into the following five steps from acquisition to application: (1) point cloud acquisition; (2) point cloud preprocessing; (3) extracting point cloud characteristics; (4) point cloud semantic segmentation; (5) downstream model deployment and application.
One difficulty with the above steps is that feature extraction and semantic segmentation require a large amount of annotation data for model training. In terms of feature extraction and semantic segmentation, the traditional method adopts a manual design feature descriptor to extract features or adopts a deep learning method to automatically extract features by using a neural network, so that the semantic segmentation is respectively realized. However, the training process of these methods is usually supervised learning, i.e. a large amount of labeling data is required for model training. However, the point cloud of the urban scene is huge in scale, and if all points are manually marked, the process is cumbersome and expensive.
Disclosure of Invention
The invention aims to overcome the difficulty that a large amount of annotation data is needed in an urban scene semantic segmentation algorithm, and provides an urban scene semantic segmentation method based on a graph convolution and a semi-supervised learning network.
The city scene semantic segmentation method based on the graph convolution and the semi-supervised learning network comprises the following steps:
s1, pre-training a graph convolution network by using a public and marked urban road data set to obtain initialization parameters of each layer in the graph convolution network;
s2, inputting an original point set into the initialized graph rolling network at one time,The points in (a) contain only the coordinate xyz and the color rgb information, and the feature vector +.>;
S3, the original point set in the step S2Using k-NN to find k adjacent points to form neighborhood, calculating feature vector according to neighborhood of each point>;
S4, calculating a feature vector andAs a loss function for adjusting the parameters of the graph roll-up network in step S2;
s5, gathering the original pointsAs target semantic segmentation dataset +.>It contains labeled data and unlabeled data, wherein the data volume of the labeled data occupies the original point set +.>The ratio of the label data is 1% -10%, and then pseudo labels are distributed to the label-free data in the semi-supervised learning network by using the label data;
s6, distributing the pseudo tag in the step S5Semantic segmentation and pre-prediction for network reasoningThe category of each point is measured.
Further, the step S2 specifically includes:
s21, using the encoder of the graph rolling network to perform the initial point setCoding to obtain coding feature->;
S22, reusing the decoder pair coding features of the graph rolling networkDecoding to obtain decoding characteristic->;
S23, decoding the characteristics through MLPThe mapping output is the feature vector +.>The method comprises the steps of carrying out a first treatment on the surface of the Wherein the feature vector->The dimension of each point in (a) is expressed as +.>, andRepresenting the coded coordinates and color features, respectively, the subscript of r represents the feature channel, 1 in the superscript of r represents the mean, and 2 represents the variance.
Further, the step S3 specifically includes:
for the original point set in step S2Using k-NN to find k adjacent points to form neighborhood, and calculating feature vector according to neighborhood of each point>Wherein the dimension of each point is expressed as:;
the calculation process is as follows:
wherein ,mean value of neighborhood coordinate channel representing each point, +.>Mean value of neighborhood color channels representing each point, < >>Representing the variance of the neighborhood color channel for each point,/->Taking 1,2,3, the self-learning process is set with +.>And->Each feature channel corresponds to a feature distance calculated, and n represents an index of k neighboring points of each point.
Further, the step S4 specifically includes:
let us say that the original set of points entered into the graph rolling network in step S2Contains->The coordinate distance is calculated as the Euclidean distance, and the color distance is calculated as the Manhattan distance;
loss function of coordinate distanceThe method comprises the following steps:
loss function of color distanceThe method comprises the following steps:
finally, the loss function is:
wherein ,for the original point set->Index of each point in ∈ -> andIs two superparameters in the graph rolling network andAre respectively set to 1/3 and 2/3; the loss function is used to train the graph convolution network in step S2 to further adjust the parameters of its encoder and decoder.
Further, the step S5 specifically includes:
s51, gathering the original pointsThe target semantic segmentation dataset is +.>Then->Is a group comprisingThe point of each point is set with the original point set +.>The point set with tag data in is +.>The number of points is +.>The point set of the unlabeled data is +.>The number of points is +.>There is-> and;
S52, training the adjusted encoder and solution in the step S4A encoder for outputting in step S4The MLP of dimension is replaced by output +.>Dimension MLP, and will output +.>The dimension vector is denoted as->;
S53, willThe feature corresponding to the point containing the tag is expressed as +.>The feature corresponding to the unlabeled point is expressed asThe method comprises the steps of carrying out a first treatment on the surface of the Then->;
wherein , andAll are->Vectors of dimensions and using indices for distinguishing between different points, and +.>Comprises the following componentsCategory 0 </o>≤,Is->The actual category number needed to be semantically divided;
s54, selecting the data belonging to the category from the known tagged dataAnd calculating the feature average of these points to obtain the class average feature vector +.>:
wherein ,the expression category is +.>The number of points, +.>Representation->The category of the corresponding point is->Then, input +.>Calculating an average eigenvector +.>, whereinThe method comprises the steps of carrying out a first treatment on the surface of the For the followingIn the remaining non-existent categories,/->Marking as zero vector;
s55, calculating the point of the label-free dataFeature vector +.>And->Similarity matrix->:
wherein ,the euclidean distance of the average feature vector representing the category to the vector corresponding to the unlabeled point,the superscript of (1) indicates category, ">The subscript of (2) indicates a certain point, and +.>,The base representing the natural logarithm, the index in brackets,/-for it>Is +.>;
S56, the feature vector in the step S53Mapping to vector +.>As a result of the prediction,is>;
For the followingLabeled points, class prediction is directly realized by using a Softmax classifier and a cross entropy loss function, and the loss function calculated by the points is +.>;
For the followingThe unlabeled dots are first generated into pseudo-labels and then used for sum +.>Comparison, specific: first selecting similarity matrix category by category>Highest confidence +.>Dots, assume common selection->Dots (/ -)>≤≤) Then selecting the category with highest confidence level for the selected points point by point, and updating the category>Maximum confidence of pseudo tags of the points and corresponding tag values;
will beThe predictive loss function of each unlabeled dot is designed to:
wherein the subscriptRepresentation->Any one of the points, s represents +.>Index of individual unlabeled dots, +.>Is->The number of categories contained in the list, m represents the index of the number of categories,/->Probability value representing final predictive label, +.>Representation ofA pseudo tag class, wherein, when the pseudo tag class and the predicted class are the same,mtaking 1, otherwise taking 0,>indicating when the point is +.>Taking 1 when in, otherwise taking 0;
s57, loss function of whole graph convolution networkThe method comprises the following steps:
wherein the weight isThe method comprises the following steps:
wherein ,epochrepresenting the current training round of the present time,max-epochrepresenting the maximum training round, at the beginningLess weight is used.
Further, the step S6 specifically includes:
iterating through the process training network in which pseudo tags are assigned in steps S51-S57 until a target data set is reachedAnd (3) performing upper convergence, and removing a similarity matrix by using the trained network in the final prediction process>Is (are) calculated for->All points in (1)Using a Softmax classifier, then read +.>The rest of the data set is iterated to realize the goal data set +.>Semantic segmentation and class prediction of all points.
After the technical scheme is adopted, compared with the background technology, the invention has the following advantages:
1. the invention adopts the idea of transfer learning, fully utilizes the similar characteristics of different urban scenes, utilizes the disclosed labeling data set to obtain the initialization parameters of the graph rolling network, and is beneficial to improving the stability of the neural network in the representation of different data sets;
2. the invention adopts a self-learning pre-training task, fully utilizes the local and color characteristics of objects in urban scenes, does not need to use labeling data, and can learn the priori distribution of the data to finely adjust network parameters;
3. the invention uses semi-supervised learning to reduce the dependence on the labeling data, so that the high-quality pseudo labels can be generated by using a small amount of labeled data, the effect of semi-supervised learning is improved, the semantic segmentation of the target data set is realized, and the dependence on the manual labeling data is greatly reduced.
Drawings
FIG. 1 is a flow chart of the pre-training graph rolling network fine tuning parameters of the present invention;
fig. 2 is a flowchart of a training process for generating a pseudo tag by the semi-supervised learning network of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Examples
The city scene semantic segmentation method based on the graph convolution and the semi-supervised learning network comprises the following steps:
the initial parameters are obtained by the pre-training graph rolling network (realized by the following step S1)
S1, pre-training a graph convolution network by using a public and labeled data set to obtain initialization parameters of each layer in the graph convolution network;
the encoder of the graph rolling network consists of a multi-layer perceptron layer and four graph rolling modules, wherein the four graph rolling modules are numbered as 1,2,3 and 4 in sequence; the input features of the graph convolution module are expressed asThe output characteristic is->, whereinI.e. the output of the previous convolution module is the input of the next convolution module and the current module input point is +.>Input feature dimension is->The characteristic dimension after convolution is +.>The number of points is reduced to +.>。
The embodiment adopts public and marked urban road data as a pre-training data set for pre-training. The Toronto3D of the public data set acquires a 1km high-quality city market scenery spot cloud by using a vehicle-mounted laser radar, and the data set manually marks more than eight millions of spots, so that 8 common city scene categories are covered: road, zebra crossing, building, power line, power tower and automobileAnd a fence, and all contain coordinate and color information. Selecting a point cloud in Toronto3DAs input to the pre-trained graph convolution network, in this embodiment +.>If the number of points (1) is 65536 +.>Is [65536, 6 ]]。
First, an encoder is used to obtain encoding featuresWherein MLP (i.e. multi-layer perceptron) is used to determine [65536, 6]Mapping to [65536,16 ]]Then inputting the extracted features into four graph convolution modules; then the decoding characteristics are obtained through a decoder>Its dimension is [65536,16 ]]The method comprises the steps of carrying out a first treatment on the surface of the Then, the pair ++is implemented using fully connected network and Softmax classifier>The classification forecast of each point in the fully connected network is set as (16, 64, 64, 8) for the characteristic dimension change of each point; then, cross entropy is used as a loss function, random gradient descent is optimized, the network is pre-trained, and parameters of each layer of the network are updated; finally, the above process is repeated until the network converges. The convergence condition is set to be 100 rounds of fixed training, and if the prediction accuracy of 20 continuous rounds is not improved, the training can be stopped. Instead of random initialization, the network of the present embodiment employs pre-trained parameter initialization.
(II) performing self-learning training task to achieve parameter fine adjustment (achieved by the following steps S2-S4)
S2, inputting an original point set into the initialized graph rolling network at one time,The points in (a) contain only the coordinate xyz and the color rgb information, and the feature vector +.>;
S3, the original point set in the step S2Using k-NN to find k adjacent points to form neighborhood, calculating feature vector according to neighborhood of each point>;
S4, calculating a feature vector andAs a loss function for adjusting the parameters of the graph roll-up network in step S2;
the step S2 specifically includes:
since step S1 has initialized the layer parameters of the graph rolling network, step S2 uses only the encoder and decoder portions thereof, modifies the fully connected network and Softmax classifier in step S1 to the MLP layer, and then performs the following steps.
S21, using the encoder of the graph rolling network to perform the initial point setCoding to obtain coding feature->;
In particular, for an original set of points that are input into the graph rolling network at one timeWherein the dots contain only 6-dimensional features of the coordinates xyz and the colors rgb, denoted as feature +.>Will->Mapping to 16 dimension by a multi-layer perceptron, the output is characterized by +>And is used as the input of the first graph rolling module, the output of the former graph rolling module is the input of the next graph rolling module, and the characteristic is output after passing through the four graph rolling modules ∈ ->I.e. the final coding feature +.>。
S22, reusing the decoder pair coding features of the graph rolling networkDecoding to obtain decoding characteristic->;
The coding features obtained in S21Feature ∈The same-dimensional mapping using MLP>And as decoder input, decoding characteristics +.>Decoding after the up-sampling of the adjacent points and the jump connection of the MLP down-sum encoder to obtain the output characteristic +.>. Wherein the decoding feature uses the subscript +.>And superscript->In distinction to the features of the encoder,and sequentially taking 4,3,2 and 1. The encoder is jump-connected to the decoder, i.e. has coding features of the same dimension +.>Decoding characteristics->The added features are used as input features for subsequent layers. Decoding characteristics->Namely decoding characteristic->。
S23, decoding the characteristics through MLPThe mapping output is the feature vector +.>The method comprises the steps of carrying out a first treatment on the surface of the Wherein the feature vector->The dimension of each point in (a) is expressed as +.>, andRepresenting the encoded coordinates and color features, r, respectivelyThe subscript indicates the characteristic channel, 1 in the superscript of r indicates the mean, and 2 indicates the variance.
The step S3 specifically comprises the following steps:
first, for the original point set in step S2Using k-NN to find k adjacent points to form a neighborhood;
specifically, for the original point set input to the networkEach point of->Finding a set of nearest neighbors using k-NNCoordinate information is then embedded:
= LBR( , , , )
wherein the coordinate featuresFrom the point->Adjacent point->Is obtained by spatial position relation of (a)Specifically, absolute coordinates of the two points are connected +.> andOffset->And spatial distance->The method comprises the steps of carrying out a first treatment on the surface of the Sign symbolLBRThen the connected feature vector is represented to sequentially pass through the Linear layer, the BatchNorm layer and the ReLU layer, and the +_in the graph rolling module>Is mapped to the same dimension as the point set features it inputs.
Then, the dotAnd adjacent point->The relation of (2) is expressed as the side relation +.>:
= R(g())
wherein ,input +.>Point set feature of the individual graph convolution +.>And its coordinate feature->Using a learnable weight after connectiongThe weighting is performed so that the weight of the sample,gcan be realized by using MLP, 1D-CNN and the like;Representing the ReLU layer. Finally, each point is aggregated channel by channel using Max-Pooling>And uses random sampling to reduce the number of points to obtain output characteristics。
Feature vectors are then calculated from the neighborhood of each pointWherein the dimension of each point is expressed as:;
the calculation process is as follows:
wherein ,mean value of neighborhood coordinate channel representing each point, +.>Mean value of neighborhood color channels representing each point, < >>Representing the variance of the neighborhood color channel for each point,/->Taking 1,2,3, the self-learning process is set with +.>And->Each feature channel corresponds to a feature distance calculated, and n represents an index of k neighboring points of each point. Because the urban street scene point clouds are distributed sparsely and unevenly, the neighborhood coordinate variance is larger, and therefore, the network only uses the coordinate mean value. The vegetation (green), pavement (gray) and other objects have obvious color characteristics, and the local color change is generally smooth, so that the color adopts two characteristics of mean and variance.
The step S4 specifically includes:
let us say that the original set of points entered into the graph rolling network in step S2Contains->The coordinate distance is calculated as the Euclidean distance, and the color distance is calculated as the Manhattan distance;
loss function of coordinate distanceThe method comprises the following steps:
loss function of color distanceThe method comprises the following steps:
finally, the loss function is:
wherein ,for the original point set->Index of each point in ∈ -> andIs two superparameters +.in the graph roll-up network> andAre respectively set to 1/3 and 2/3; the loss function is used to train the graph convolution network in step S2 to further adjust the parameters of its encoder and decoder.
The steps S2-S4 realize fine adjustment of parameters of the pre-trained graph rolling network. Specifically, in the present embodiment, the first and second embodiments,
(1) The pre-trained encoder and decoder are fixed first, and the subsequent full-connection layer of the decoder is changed into a multi-layer perceptron (i.e. MLP) which sets (16,32,9) the feature dimension variation for each point. Partitioning data sets from target semanticsConstructing a point cloud->(as in the first construction mode in the previous pre-training step)Sample, change only dataset), point cloud is passed through the network output feature of fig. 1 +.>Its dimension is [65536, 9]。
(2) At the same time toAnd respectively constructing a neighborhood by using k-NN for each point, wherein the number k of the neighborhood points is set to be 16. The characteristic calculation mode of one point is as follows:
wherein ,mean value of neighborhood coordinate channel representing the point, +.>Mean value of neighborhood color channel representing the point, < +.>Representing the variance of the neighborhood color channel for that point. i 1,2,3, the self-learning procedure is set up->And->Each feature channel corresponds to a feature distance calculated, and n represents an index of k neighboring points of each point. The characteristics that give this point are expressed as:
thenConstructed->All points of (2) are characterized by->Its dimension is [65536, 9]。
(3) Calculation of andIs trained as a function of the loss of the network of fig. 1. The feature associated with the coordinates is calculated as the euclidean distance and the feature distance associated with the color is calculated as the manhattan distance. Coordinate distance loss functionThe method comprises the following steps:
color distance loss functionThe method comprises the following steps:
finally, the loss function is:
wherein ,for the original point set->Index of each point in ∈ -> andIs two super parameters, set to 1/3 and 2/3.
Training was optimized using random gradient descent, fixed training 30 rounds. The pre-training is used to fine tune the parameters of the encoder and decoder to adapt it toIs encoded by the data set of (a).
(III) generating pseudo tags and semantic segmentation by using semi-supervised learning network (realized by the following steps S5 and S6)
S5, gathering the original pointsAs target semantic segmentation dataset +.>Which contains a small amount of tagged data and a large amount of untagged data, wherein the amount of tagged data occupies the original point set +.>The ratio of the label data is 1% -10%, and then pseudo labels are distributed to the label-free data in the semi-supervised learning network by using the label data;
s6, distributing the pseudo tag in the step S5For network reasoning, semantics divide and predict the category of each point.
The step S5 specifically comprises the following steps:
s51, gathering the original pointsThe target semantic segmentation dataset is +.>Then->Is a group comprisingThe point of each point is set with the original point set +.>The point set with tag data in is +.>The number of points is +.>The point set of the unlabeled data is +.>The number of points is +.>There is-> and;
S52, using the encoder and decoder trained and adjusted in the step S4, outputting in the step S4The MLP of dimension is replaced by output +.>Dimension MLP, and will output +.>The dimension vector is denoted as->;
S53, willThe feature corresponding to the point containing the tag is expressed as +.>The feature corresponding to the unlabeled point is expressed asThe method comprises the steps of carrying out a first treatment on the surface of the Then->;
wherein , andAll are->Vectors of dimensions and using indices for distinguishing between different points, and +.>Comprises the following componentsCategory 0 </o>≤,Is->The actual category number needed to be semantically divided;
s54, selecting from the known tagged dataBelongs to the category ofAnd calculating the feature average of these points to obtain the class average feature vector +.>:
wherein ,the expression category is +.>The number of points, +.>Representation->The category of the corresponding point is->Then, input +.>Calculating an average eigenvector +.>, whereinThe method comprises the steps of carrying out a first treatment on the surface of the For the followingIn the remaining non-existent categories,/->Marking as zero vector;
s55, calculating the point of the label-free dataFeature vector +.>And->Similarity matrix->:
wherein ,the euclidean distance of the average feature vector representing the category to the vector corresponding to the unlabeled point,the superscript of (1) indicates category, ">The subscript of (2) indicates a certain point, and +.>,The base representing the natural logarithm, the index in brackets,/-for it>Is +.>;
S56, the feature vector in the step S53Mapping to vector +.>As a result of the prediction,is>;
For the followingLabeled points, class prediction is directly realized by using a Softmax classifier and a cross entropy loss function, and the loss function calculated by the points is +.>;
For the followingThe unlabeled dots are first generated into pseudo-labels and then used for sum +.>In contrast, if points with low confidence in the pseudo tag are used, larger errors will be generated in the segmentation result. Thus, the similarity matrix may be selected on a category-by-category basisHighest confidence +.>And selecting the category with highest confidence level for the selected points point by point.
Specific: selecting similarity matrices category by categoryHighest confidence +.>Point, assume co-selectionIndividual points(≤≤) Selecting the category with highest confidence level for the selected points point by point, and updatingMaximum confidence of pseudo tags of the points and corresponding tag values;
will beThe predictive loss function of each unlabeled dot is designed to:
wherein the subscriptRepresentation->Any one of the points, s represents +.>Index of individual unlabeled dots, +.>Is->The number of categories contained in the list, m represents the index of the number of categories,/->Probability value representing final predictive label, +.>Representing a pseudo tag category, and when the pseudo tag category and the predicted category are the same,mtaking 1, otherwise taking 0,>indicating when the point is +.>Taking 1 when in, otherwise taking 0;
s57, loss function of whole graph convolution networkThe method comprises the following steps:
wherein the weight isThe method comprises the following steps: />
wherein ,epochrepresenting the current training round of the present time,max-epochrepresenting the maximum training round, at the beginningLess weight is used.
The step S6 specifically includes:
iterating through the process training network in which pseudo tags are assigned in steps S51-S57 until a target data set is reachedAnd (3) performing upper convergence, and removing a similarity matrix by using the trained network in the final prediction process>Is (are) calculated for->All points in (1) were using a Softmax classifier and then read +.>The rest of the data set is iterated to realize the goal data set +.>Semantic segmentation and class prediction of all points.
Specifically, the present embodiment modifies the subsequent layer of the decoder in fig. 1 into two MLPs, the feature dimension change of the first MLP for each point is set to (16,32,32), and the features are outputIts dimension is [65536, 32]The method comprises the steps of carrying out a first treatment on the surface of the The second MLP is set to (32,32,8) for each point feature dimension variation, outputting the feature ∈ ->Its dimension is [65536, 8]The modified network architecture is shown in fig. 2.
Target semantic segmentation datasetIt is necessary to contain a small amount of tagged data and a large amount of untagged data, and pseudo tags are assigned to the untagged data using the tagged data in step S5. Partitioning data sets from target semanticsConstructing a point cloud->(the construction mode is the same as that in the previous self-learning pre-training task), and the ++is constructed each time during semi-supervised training>1% -10% of the points are required to be provided with marking information. For example, in one training round, 65536 points contain 4096 labeling points, and the labeling information accounts for 6.25% in total of 5 categories. Use->Calculating the characteristic average value of the labeling points corresponding to each category to obtain the category average characteristic vector +.>:
wherein ,representation->Some marked point is +.>Corresponding features of the category +.>. Then, for 4096 points of the input +.>Average feature vector calculated for each category->, wherein. For->In the remaining non-existent 3 categories, +.>Recorded as zero vector.
Next, unlabeled dots are calculatedFeature vector +.>And->Similarity matrix->:
wherein ,,is +.>。
Then, the feature vector isMapping to vector +.>As a result of the prediction, < > for>Is>. Wherein 4096 labeled points directly implement class prediction using a Softmax classifier and a cross entropy loss function. But for->There are no tagged points and a pseudo tag needs to be generated for and + ->And (5) comparing. First selecting similarity matrix category by category>Highest confidence +.>And selecting the category with highest confidence level for the selected points point by point. Assume common selection->Dots (/ -)>≤≤) Then update this->Maximum confidence of pseudo tags of individual points and corresponding tag values. Wherein (1)>Set to 50% of the number of each category. Training runs were set to 100 and Adam was selected as the optimization method.
Finally, the trained network is utilized to remove the similarity matrixI.e. remove +.>To simi and simi to ∈>Is calculated by the computer. Then, for one time input to the network at the time of test +.>In (1) using a Softmax classifier, iterative construction +.>Until reading->And predicting tag values for all the points to realize semantic segmentation.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.
Claims (4)
1. The city scene semantic segmentation method based on the graph convolution and the semi-supervised learning network is characterized by comprising the following steps of: the method comprises the following steps:
s1, pre-training a graph convolution network by using a public and marked urban road data set to obtain initialization parameters of each layer in the graph convolution network;
s2, inputting an original point set into the initialized graph rolling network at one time,The points in (a) contain only the coordinate xyz and the color rgb information, and the feature vector +.>;
The step S2 specifically comprises the following steps:
s21, using the encoder of the graph rolling network to perform the initial point setCoding to obtain coding feature->The method comprises the steps of carrying out a first treatment on the surface of the For the original point set entered once into the graph convolution network +.>Wherein the dots contain only 6-dimensional features of the coordinates xyz and the colors rgb, denoted as feature +.>Will->Through an MLP mapping to 16 dimensions, the output is characterized by +.>And is used as the input of the first graph rolling module, the output of the former graph rolling module is the input of the next graph rolling module, and the characteristic is output after passing through the four graph rolling modules ∈ ->I.e. the final coding feature +.>;
S22, reusing the decoder pair coding features of the graph rolling networkDecoding to obtain decoding characteristic->;
S23, decoding the characteristics through MLPThe mapping output is the feature vector +.>The method comprises the steps of carrying out a first treatment on the surface of the Wherein the feature vector->The dimension of each point in (a) is expressed as +.>, andRepresenting the coded coordinates and color features respectively, wherein the subscript of r represents a feature channel, 1 in the superscript of r represents a mean value, and 2 represents a variance;
s3, the original point set in the step S2Using k-NN to find k adjacent points to form neighborhood, calculating feature vector according to neighborhood of each point>;
S4, calculating a feature vector andAs a loss function for adjusting the parameters of the graph roll-up network in step S2;
the step S4 specifically includes:
let us say that the original set of points entered into the graph rolling network in step S2Contains->The coordinate distance is calculated as the Euclidean distance, and the color distance is calculated as the Manhattan distance;
loss function of coordinate distanceThe method comprises the following steps:Loss function of color distanceThe method comprises the following steps:Finally, the loss function is:
wherein alpha is the original point set +.>Index of each point in ∈ -> andIs two superparameters +.in the graph roll-up network> andAre respectively set to 1/3 and 2/3; training the graph convolution network in the step S2 by using the loss function, and further adjusting parameters of an encoder and a decoder of the graph convolution network;
s5, gathering the original pointsAs target semantic segmentation dataset +.>It contains labeled data and unlabeled data, wherein the data volume of the labeled data occupies the original point set +.>The ratio of the label data is 1% -10%, and then pseudo labels are distributed to the label-free data in the semi-supervised learning network by using the label data;
s6, distributing the pseudo tag in the step S5For network reasoning, semantics divide and predict the category of each point.
2. The urban scene semantic segmentation method based on graph convolution and semi-supervised learning network as set forth in claim 1, wherein: the step S3 specifically comprises the following steps:
for the original point set in step S2Using k-NN to find k adjacent points to form neighborhood, and calculating feature vector according to neighborhood of each point>Wherein the dimension of each point is expressed as:;
the calculation process is as follows: wherein ,Mean value of neighborhood coordinate channel representing each point, +.>Mean value of neighborhood color channels representing each point, < >>Representing the variance of the neighborhood color channel for each point,/->Taking 1,2,3, and setting +.>And->Each characteristic channel corresponds, n representing the index of k neighboring points for each point.
3. The urban scene semantic segmentation method based on graph convolution and semi-supervised learning network as set forth in claim 2, wherein: the step S5 specifically comprises the following steps:
s51, gathering the original pointsThe target semantic segmentation dataset is +.>Then->Is a group containing->The point of each point is set with the original point set +.>With tag data thereinThe point set of (2) is->The number of points is +.>The point set of the unlabeled data is +.>The number of points is +.>There is-> and;
S52, using the encoder and decoder trained and adjusted in the step S4, outputting in the step S4The MLP of dimension is replaced by output +.>Dimension MLP, and will output +.>The dimension vector is denoted as->;
S53, willThe feature corresponding to the point containing the tag is expressed as +.>The feature corresponding to the unlabeled dot is denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the Then;
wherein , andAll are->Vectors of dimensions and using indices for distinguishing between different points, and +.>The composition contains->Category 0 </o>≤,Is->The actual category number needed to be semantically divided;
s54, selecting the data belonging to the category from the known tagged dataAnd calculating the feature average of these points to obtain the class average feature vector +.>: wherein ,The expression category is +.>The number of points, +.>Representation->The category of the corresponding point is->Then, input +.>In the individual dot +.>Average feature vector calculated for each category->, whereinThe method comprises the steps of carrying out a first treatment on the surface of the For->In the remaining non-existent categories,/->Marking as zero vector;
s55, calculating the point of the label-free dataFeature vector +.>And->Similarity matrix->:
wherein ,Euclidean distance of average feature vector representing category and vector corresponding to unlabeled point, ++>The superscript of (1) indicates category, ">Subscript +.>Representation->Any one of the points, and +.>,The base representing the natural logarithm, the index in brackets,/-for it>Is of the dimension of;
S56, the feature vector in the step S53Mapping to vector +.>As a result of the prediction, < > for>Is>;
For the followingLabeled points, class prediction is directly realized by using a Softmax classifier and a cross entropy loss function, and the loss function calculated by the points is +.>;
For the followingThe unlabeled dots are first generated into pseudo-labels and then used for sum +.>Comparison, specific: first selecting similarity matrix category by category>Highest confidence +.>Dots, assume common selection->Point(s) of (E)>≤≤Then selecting the category with highest confidence level for the selected points point by point, and updating the category>Maximum confidence of pseudo tags of the points and corresponding tag values;
will beThe predictive loss function of each unlabeled dot is designed to:
wherein the subscript->Representation->Any one of the points, s represents +.>Index of individual unlabeled dots, +.>Is->The number of categories contained in the table, m represents an index of the number of categories,representing the mostProbability value of the final predictive label, +.>Representing a pseudo tag category, and when the pseudo tag category and the predicted category are the same,mtaking 1, otherwise taking 0,>indicating when the point is +.>Taking 1 when in, otherwise taking 0;
s57, loss function of whole graph convolution networkThe method comprises the following steps:
wherein the weight->The method comprises the following steps:
wherein ,epochrepresenting the current training round of the present time,max-epochrepresents the maximum training round, initially +.>Less weight is used.
4. The urban scene semantic segmentation method based on graph convolution and semi-supervised learning network as set forth in claim 3, wherein: the step S6 specifically includes:
iterating through the process training network in which pseudo tags are assigned in steps S51-S57 until a target data set is reachedAnd (3) performing upper convergence, and removing a similarity matrix by using the trained network in the final prediction process>Is (are) calculated for->All points in (1) were using a Softmax classifier and then read +.>The rest points in the data set are iterated, namely the target data set is realizedSemantic segmentation and class prediction of all points.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310596881.7A CN116310350B (en) | 2023-05-25 | 2023-05-25 | Urban scene semantic segmentation method based on graph convolution and semi-supervised learning network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310596881.7A CN116310350B (en) | 2023-05-25 | 2023-05-25 | Urban scene semantic segmentation method based on graph convolution and semi-supervised learning network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116310350A CN116310350A (en) | 2023-06-23 |
CN116310350B true CN116310350B (en) | 2023-08-18 |
Family
ID=86785552
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310596881.7A Active CN116310350B (en) | 2023-05-25 | 2023-05-25 | Urban scene semantic segmentation method based on graph convolution and semi-supervised learning network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116310350B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116863432B (en) * | 2023-09-04 | 2023-12-22 | 之江实验室 | Weak supervision laser travelable region prediction method and system based on deep learning |
CN117576217B (en) * | 2024-01-12 | 2024-03-26 | 电子科技大学 | Object pose estimation method based on single-instance image reconstruction |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112070779A (en) * | 2020-08-04 | 2020-12-11 | 武汉大学 | Remote sensing image road segmentation method based on convolutional neural network weak supervised learning |
CN112785611A (en) * | 2021-01-29 | 2021-05-11 | 昆明理工大学 | 3D point cloud weak supervision semantic segmentation method and system |
CN112861722A (en) * | 2021-02-09 | 2021-05-28 | 中国科学院地理科学与资源研究所 | Remote sensing land utilization semantic segmentation method based on semi-supervised depth map convolution |
CN113936217A (en) * | 2021-10-25 | 2022-01-14 | 华中师范大学 | Priori semantic knowledge guided high-resolution remote sensing image weakly supervised building change detection method |
CN114187446A (en) * | 2021-12-09 | 2022-03-15 | 厦门大学 | Cross-scene contrast learning weak supervision point cloud semantic segmentation method |
US11450008B1 (en) * | 2020-02-27 | 2022-09-20 | Amazon Technologies, Inc. | Segmentation using attention-weighted loss and discriminative feature learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113570610B (en) * | 2021-07-26 | 2022-05-13 | 北京百度网讯科技有限公司 | Method and device for performing target segmentation on video by adopting semantic segmentation model |
-
2023
- 2023-05-25 CN CN202310596881.7A patent/CN116310350B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11450008B1 (en) * | 2020-02-27 | 2022-09-20 | Amazon Technologies, Inc. | Segmentation using attention-weighted loss and discriminative feature learning |
CN112070779A (en) * | 2020-08-04 | 2020-12-11 | 武汉大学 | Remote sensing image road segmentation method based on convolutional neural network weak supervised learning |
CN112785611A (en) * | 2021-01-29 | 2021-05-11 | 昆明理工大学 | 3D point cloud weak supervision semantic segmentation method and system |
CN112861722A (en) * | 2021-02-09 | 2021-05-28 | 中国科学院地理科学与资源研究所 | Remote sensing land utilization semantic segmentation method based on semi-supervised depth map convolution |
CN113936217A (en) * | 2021-10-25 | 2022-01-14 | 华中师范大学 | Priori semantic knowledge guided high-resolution remote sensing image weakly supervised building change detection method |
CN114187446A (en) * | 2021-12-09 | 2022-03-15 | 厦门大学 | Cross-scene contrast learning weak supervision point cloud semantic segmentation method |
Also Published As
Publication number | Publication date |
---|---|
CN116310350A (en) | 2023-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116310350B (en) | Urban scene semantic segmentation method based on graph convolution and semi-supervised learning network | |
CN113421269B (en) | Real-time semantic segmentation method based on double-branch deep convolutional neural network | |
CN113449594B (en) | Multilayer network combined remote sensing image ground semantic segmentation and area calculation method | |
CN110796168A (en) | Improved YOLOv 3-based vehicle detection method | |
CN112507793A (en) | Ultra-short-term photovoltaic power prediction method | |
CN112149547B (en) | Remote sensing image water body identification method based on image pyramid guidance and pixel pair matching | |
CN113487066A (en) | Long-time-sequence freight volume prediction method based on multi-attribute enhanced graph convolution-Informer model | |
CN115482491B (en) | Bridge defect identification method and system based on transformer | |
CN113256649B (en) | Remote sensing image station selection and line selection semantic segmentation method based on deep learning | |
CN112712052A (en) | Method for detecting and identifying weak target in airport panoramic video | |
CN111967325A (en) | Unsupervised cross-domain pedestrian re-identification method based on incremental optimization | |
CN116452794B (en) | Directed target detection method based on semi-supervised learning | |
Hou et al. | PCLUDA: A pseudo-label consistency learning-based unsupervised domain adaptation method for cross-domain optical remote sensing image retrieval | |
CN117152427A (en) | Remote sensing image semantic segmentation method and system based on diffusion model and knowledge distillation | |
CN111368843A (en) | Method for extracting lake on ice based on semantic segmentation | |
CN118277770A (en) | Obstacle sensing method and device, electronic equipment and storage medium | |
Tian et al. | Semantic segmentation of remote sensing image based on GAN and FCN network model | |
CN114399687A (en) | Semi-supervised self-training hyperspectral remote sensing image classification method based on spatial correction | |
CN117975131A (en) | Urban road traffic accident black point prediction method and system based on deep learning and density clustering | |
CN113536944A (en) | Distribution line inspection data identification and analysis method based on image identification | |
CN117237660A (en) | Point cloud data processing and segmentation method based on deep learning feature aggregation | |
CN117290673A (en) | Ship energy consumption high-precision prediction system based on multi-model fusion | |
CN116129280A (en) | Method for detecting snow in remote sensing image | |
CN115965867A (en) | Remote sensing image earth surface coverage classification method based on pseudo label and category dictionary learning | |
CN115393713A (en) | Scene understanding method based on plot perception dynamic memory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |