CN116310350B - Urban scene semantic segmentation method based on graph convolution and semi-supervised learning network - Google Patents

Urban scene semantic segmentation method based on graph convolution and semi-supervised learning network Download PDF

Info

Publication number
CN116310350B
CN116310350B CN202310596881.7A CN202310596881A CN116310350B CN 116310350 B CN116310350 B CN 116310350B CN 202310596881 A CN202310596881 A CN 202310596881A CN 116310350 B CN116310350 B CN 116310350B
Authority
CN
China
Prior art keywords
point
points
network
category
steps
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310596881.7A
Other languages
Chinese (zh)
Other versions
CN116310350A (en
Inventor
王程
陈钧
陈一平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202310596881.7A priority Critical patent/CN116310350B/en
Publication of CN116310350A publication Critical patent/CN116310350A/en
Application granted granted Critical
Publication of CN116310350B publication Critical patent/CN116310350B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a city scene semantic segmentation method based on a graph convolution and a semi-supervised learning network, which comprises the following steps: s1, a network is rolled through a pre-training chart to obtain initialization parameters; s2, inputting an original point set onceOutputting the feature vectorThe method comprises the steps of carrying out a first treatment on the surface of the S3, for the original point setComputing feature vectors from the neighborhood of each pointThe method comprises the steps of carrying out a first treatment on the surface of the S4, calculating a feature vectorAndas a loss function to adjust parameters of the graph rolling network; s5, using the labeled data to distribute pseudo labels for the unlabeled data; s6, distributing the pseudo tagSemantic segmentation is performed and the class of each point is predicted. The method can realize the semantic segmentation of the urban road scene only by a small amount of marked data.

Description

Urban scene semantic segmentation method based on graph convolution and semi-supervised learning network
Technical Field
The invention relates to the field of computer graphics, in particular to a city scene semantic segmentation method based on graph convolution and a semi-supervised learning network.
Background
The point cloud is used as unstructured three-dimensional data, compared with data formats such as voxels and grids, the point cloud characterizes objects more accurately, flexibly and variously, and has wide application in the field of smart cities. For example, in planning urban construction, a digital traffic map is generated by using point clouds, and planning of traffic lines, urban construction and the like is assisted, so that planning efficiency and precision are improved; in environment monitoring and analysis, the point cloud data can be utilized to carry out three-dimensional modeling on an actual scene, so that the three-dimensional modeling method is used for analyzing conditions such as landforms, hydrogeology, building damage and the like, and is convenient for city management and maintenance.
In practical smart city applications, the point cloud can be generally divided into the following five steps from acquisition to application: (1) point cloud acquisition; (2) point cloud preprocessing; (3) extracting point cloud characteristics; (4) point cloud semantic segmentation; (5) downstream model deployment and application.
One difficulty with the above steps is that feature extraction and semantic segmentation require a large amount of annotation data for model training. In terms of feature extraction and semantic segmentation, the traditional method adopts a manual design feature descriptor to extract features or adopts a deep learning method to automatically extract features by using a neural network, so that the semantic segmentation is respectively realized. However, the training process of these methods is usually supervised learning, i.e. a large amount of labeling data is required for model training. However, the point cloud of the urban scene is huge in scale, and if all points are manually marked, the process is cumbersome and expensive.
Disclosure of Invention
The invention aims to overcome the difficulty that a large amount of annotation data is needed in an urban scene semantic segmentation algorithm, and provides an urban scene semantic segmentation method based on a graph convolution and a semi-supervised learning network.
The city scene semantic segmentation method based on the graph convolution and the semi-supervised learning network comprises the following steps:
s1, pre-training a graph convolution network by using a public and marked urban road data set to obtain initialization parameters of each layer in the graph convolution network;
s2, inputting an original point set into the initialized graph rolling network at one time,/>The points in (a) contain only the coordinate xyz and the color rgb information, and the feature vector +.>
S3, the original point set in the step S2Using k-NN to find k adjacent points to form neighborhood, calculating feature vector according to neighborhood of each point>
S4, calculating a feature vector and />As a loss function for adjusting the parameters of the graph roll-up network in step S2;
s5, gathering the original pointsAs target semantic segmentation dataset +.>It contains labeled data and unlabeled data, wherein the data volume of the labeled data occupies the original point set +.>The ratio of the label data is 1% -10%, and then pseudo labels are distributed to the label-free data in the semi-supervised learning network by using the label data;
s6, distributing the pseudo tag in the step S5Semantic segmentation and pre-prediction for network reasoningThe category of each point is measured.
Further, the step S2 specifically includes:
s21, using the encoder of the graph rolling network to perform the initial point setCoding to obtain coding feature->
S22, reusing the decoder pair coding features of the graph rolling networkDecoding to obtain decoding characteristic->
S23, decoding the characteristics through MLPThe mapping output is the feature vector +.>The method comprises the steps of carrying out a first treatment on the surface of the Wherein the feature vector->The dimension of each point in (a) is expressed as +.>,/> and />Representing the coded coordinates and color features, respectively, the subscript of r represents the feature channel, 1 in the superscript of r represents the mean, and 2 represents the variance.
Further, the step S3 specifically includes:
for the original point set in step S2Using k-NN to find k adjacent points to form neighborhood, and calculating feature vector according to neighborhood of each point>Wherein the dimension of each point is expressed as:
the calculation process is as follows:
wherein ,mean value of neighborhood coordinate channel representing each point, +.>Mean value of neighborhood color channels representing each point, < >>Representing the variance of the neighborhood color channel for each point,/->Taking 1,2,3, the self-learning process is set with +.>And->Each feature channel corresponds to a feature distance calculated, and n represents an index of k neighboring points of each point.
Further, the step S4 specifically includes:
let us say that the original set of points entered into the graph rolling network in step S2Contains->The coordinate distance is calculated as the Euclidean distance, and the color distance is calculated as the Manhattan distance;
loss function of coordinate distanceThe method comprises the following steps:
loss function of color distanceThe method comprises the following steps:
finally, the loss function is:
wherein ,for the original point set->Index of each point in ∈ -> and />Is two superparameters in the graph rolling network and />Are respectively set to 1/3 and 2/3; the loss function is used to train the graph convolution network in step S2 to further adjust the parameters of its encoder and decoder.
Further, the step S5 specifically includes:
s51, gathering the original pointsThe target semantic segmentation dataset is +.>Then->Is a group comprisingThe point of each point is set with the original point set +.>The point set with tag data in is +.>The number of points is +.>The point set of the unlabeled data is +.>The number of points is +.>There is-> and />
S52, training the adjusted encoder and solution in the step S4A encoder for outputting in step S4The MLP of dimension is replaced by output +.>Dimension MLP, and will output +.>The dimension vector is denoted as->
S53, willThe feature corresponding to the point containing the tag is expressed as +.>The feature corresponding to the unlabeled point is expressed asThe method comprises the steps of carrying out a first treatment on the surface of the Then->
wherein , and />All are->Vectors of dimensions and using indices for distinguishing between different points, and +.>Comprises the following componentsCategory 0 </o>≤/>,/>Is->The actual category number needed to be semantically divided;
s54, selecting the data belonging to the category from the known tagged dataAnd calculating the feature average of these points to obtain the class average feature vector +.>
wherein ,the expression category is +.>The number of points, +.>Representation->The category of the corresponding point is->Then, input +.>Calculating an average eigenvector +.>, wherein />The method comprises the steps of carrying out a first treatment on the surface of the For the followingIn the remaining non-existent categories,/->Marking as zero vector;
s55, calculating the point of the label-free dataFeature vector +.>And->Similarity matrix->
wherein ,the euclidean distance of the average feature vector representing the category to the vector corresponding to the unlabeled point,the superscript of (1) indicates category, ">The subscript of (2) indicates a certain point, and +.>,/>The base representing the natural logarithm, the index in brackets,/-for it>Is +.>
S56, the feature vector in the step S53Mapping to vector +.>As a result of the prediction,is>
For the followingLabeled points, class prediction is directly realized by using a Softmax classifier and a cross entropy loss function, and the loss function calculated by the points is +.>
For the followingThe unlabeled dots are first generated into pseudo-labels and then used for sum +.>Comparison, specific: first selecting similarity matrix category by category>Highest confidence +.>Dots, assume common selection->Dots (/ -)>≤/>≤/>) Then selecting the category with highest confidence level for the selected points point by point, and updating the category>Maximum confidence of pseudo tags of the points and corresponding tag values;
will beThe predictive loss function of each unlabeled dot is designed to:
wherein the subscriptRepresentation->Any one of the points, s represents +.>Index of individual unlabeled dots, +.>Is->The number of categories contained in the list, m represents the index of the number of categories,/->Probability value representing final predictive label, +.>Representation ofA pseudo tag class, wherein, when the pseudo tag class and the predicted class are the same,mtaking 1, otherwise taking 0,>indicating when the point is +.>Taking 1 when in, otherwise taking 0;
s57, loss function of whole graph convolution networkThe method comprises the following steps:
wherein the weight isThe method comprises the following steps:
wherein ,epochrepresenting the current training round of the present time,max-epochrepresenting the maximum training round, at the beginningLess weight is used.
Further, the step S6 specifically includes:
iterating through the process training network in which pseudo tags are assigned in steps S51-S57 until a target data set is reachedAnd (3) performing upper convergence, and removing a similarity matrix by using the trained network in the final prediction process>Is (are) calculated for->All points in (1)Using a Softmax classifier, then read +.>The rest of the data set is iterated to realize the goal data set +.>Semantic segmentation and class prediction of all points.
After the technical scheme is adopted, compared with the background technology, the invention has the following advantages:
1. the invention adopts the idea of transfer learning, fully utilizes the similar characteristics of different urban scenes, utilizes the disclosed labeling data set to obtain the initialization parameters of the graph rolling network, and is beneficial to improving the stability of the neural network in the representation of different data sets;
2. the invention adopts a self-learning pre-training task, fully utilizes the local and color characteristics of objects in urban scenes, does not need to use labeling data, and can learn the priori distribution of the data to finely adjust network parameters;
3. the invention uses semi-supervised learning to reduce the dependence on the labeling data, so that the high-quality pseudo labels can be generated by using a small amount of labeled data, the effect of semi-supervised learning is improved, the semantic segmentation of the target data set is realized, and the dependence on the manual labeling data is greatly reduced.
Drawings
FIG. 1 is a flow chart of the pre-training graph rolling network fine tuning parameters of the present invention;
fig. 2 is a flowchart of a training process for generating a pseudo tag by the semi-supervised learning network of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Examples
The city scene semantic segmentation method based on the graph convolution and the semi-supervised learning network comprises the following steps:
the initial parameters are obtained by the pre-training graph rolling network (realized by the following step S1)
S1, pre-training a graph convolution network by using a public and labeled data set to obtain initialization parameters of each layer in the graph convolution network;
the encoder of the graph rolling network consists of a multi-layer perceptron layer and four graph rolling modules, wherein the four graph rolling modules are numbered as 1,2,3 and 4 in sequence; the input features of the graph convolution module are expressed asThe output characteristic is->, wherein />I.e. the output of the previous convolution module is the input of the next convolution module and the current module input point is +.>Input feature dimension is->The characteristic dimension after convolution is +.>The number of points is reduced to +.>
The embodiment adopts public and marked urban road data as a pre-training data set for pre-training. The Toronto3D of the public data set acquires a 1km high-quality city market scenery spot cloud by using a vehicle-mounted laser radar, and the data set manually marks more than eight millions of spots, so that 8 common city scene categories are covered: road, zebra crossing, building, power line, power tower and automobileAnd a fence, and all contain coordinate and color information. Selecting a point cloud in Toronto3DAs input to the pre-trained graph convolution network, in this embodiment +.>If the number of points (1) is 65536 +.>Is [65536, 6 ]]。
First, an encoder is used to obtain encoding featuresWherein MLP (i.e. multi-layer perceptron) is used to determine [65536, 6]Mapping to [65536,16 ]]Then inputting the extracted features into four graph convolution modules; then the decoding characteristics are obtained through a decoder>Its dimension is [65536,16 ]]The method comprises the steps of carrying out a first treatment on the surface of the Then, the pair ++is implemented using fully connected network and Softmax classifier>The classification forecast of each point in the fully connected network is set as (16, 64, 64, 8) for the characteristic dimension change of each point; then, cross entropy is used as a loss function, random gradient descent is optimized, the network is pre-trained, and parameters of each layer of the network are updated; finally, the above process is repeated until the network converges. The convergence condition is set to be 100 rounds of fixed training, and if the prediction accuracy of 20 continuous rounds is not improved, the training can be stopped. Instead of random initialization, the network of the present embodiment employs pre-trained parameter initialization.
(II) performing self-learning training task to achieve parameter fine adjustment (achieved by the following steps S2-S4)
S2, inputting an original point set into the initialized graph rolling network at one time,/>The points in (a) contain only the coordinate xyz and the color rgb information, and the feature vector +.>
S3, the original point set in the step S2Using k-NN to find k adjacent points to form neighborhood, calculating feature vector according to neighborhood of each point>
S4, calculating a feature vector and />As a loss function for adjusting the parameters of the graph roll-up network in step S2;
the step S2 specifically includes:
since step S1 has initialized the layer parameters of the graph rolling network, step S2 uses only the encoder and decoder portions thereof, modifies the fully connected network and Softmax classifier in step S1 to the MLP layer, and then performs the following steps.
S21, using the encoder of the graph rolling network to perform the initial point setCoding to obtain coding feature->
In particular, for an original set of points that are input into the graph rolling network at one timeWherein the dots contain only 6-dimensional features of the coordinates xyz and the colors rgb, denoted as feature +.>Will->Mapping to 16 dimension by a multi-layer perceptron, the output is characterized by +>And is used as the input of the first graph rolling module, the output of the former graph rolling module is the input of the next graph rolling module, and the characteristic is output after passing through the four graph rolling modules ∈ ->I.e. the final coding feature +.>
S22, reusing the decoder pair coding features of the graph rolling networkDecoding to obtain decoding characteristic->
The coding features obtained in S21Feature ∈The same-dimensional mapping using MLP>And as decoder input, decoding characteristics +.>Decoding after the up-sampling of the adjacent points and the jump connection of the MLP down-sum encoder to obtain the output characteristic +.>. Wherein the decoding feature uses the subscript +.>And superscript->In distinction to the features of the encoder,and sequentially taking 4,3,2 and 1. The encoder is jump-connected to the decoder, i.e. has coding features of the same dimension +.>Decoding characteristics->The added features are used as input features for subsequent layers. Decoding characteristics->Namely decoding characteristic->
S23, decoding the characteristics through MLPThe mapping output is the feature vector +.>The method comprises the steps of carrying out a first treatment on the surface of the Wherein the feature vector->The dimension of each point in (a) is expressed as +.>,/> and />Representing the encoded coordinates and color features, r, respectivelyThe subscript indicates the characteristic channel, 1 in the superscript of r indicates the mean, and 2 indicates the variance.
The step S3 specifically comprises the following steps:
first, for the original point set in step S2Using k-NN to find k adjacent points to form a neighborhood;
specifically, for the original point set input to the networkEach point of->Finding a set of nearest neighbors using k-NNCoordinate information is then embedded:
= LBR(/> , /> , /> ,/>)
wherein the coordinate featuresFrom the point->Adjacent point->Is obtained by spatial position relation of (a)Specifically, absolute coordinates of the two points are connected +.> and />Offset->And spatial distance->The method comprises the steps of carrying out a first treatment on the surface of the Sign symbolLBRThen the connected feature vector is represented to sequentially pass through the Linear layer, the BatchNorm layer and the ReLU layer, and the +_in the graph rolling module>Is mapped to the same dimension as the point set features it inputs.
Then, the dotAnd adjacent point->The relation of (2) is expressed as the side relation +.>
= R(g(/>))
wherein ,input +.>Point set feature of the individual graph convolution +.>And its coordinate feature->Using a learnable weight after connectiongThe weighting is performed so that the weight of the sample,gcan be realized by using MLP, 1D-CNN and the like; />Representing the ReLU layer. Finally, each point is aggregated channel by channel using Max-Pooling>And uses random sampling to reduce the number of points to obtain output characteristics
Feature vectors are then calculated from the neighborhood of each pointWherein the dimension of each point is expressed as:
the calculation process is as follows:
wherein ,mean value of neighborhood coordinate channel representing each point, +.>Mean value of neighborhood color channels representing each point, < >>Representing the variance of the neighborhood color channel for each point,/->Taking 1,2,3, the self-learning process is set with +.>And->Each feature channel corresponds to a feature distance calculated, and n represents an index of k neighboring points of each point. Because the urban street scene point clouds are distributed sparsely and unevenly, the neighborhood coordinate variance is larger, and therefore, the network only uses the coordinate mean value. The vegetation (green), pavement (gray) and other objects have obvious color characteristics, and the local color change is generally smooth, so that the color adopts two characteristics of mean and variance.
The step S4 specifically includes:
let us say that the original set of points entered into the graph rolling network in step S2Contains->The coordinate distance is calculated as the Euclidean distance, and the color distance is calculated as the Manhattan distance;
loss function of coordinate distanceThe method comprises the following steps:
loss function of color distanceThe method comprises the following steps:
finally, the loss function is:
wherein ,for the original point set->Index of each point in ∈ -> and />Is two superparameters +.in the graph roll-up network> and />Are respectively set to 1/3 and 2/3; the loss function is used to train the graph convolution network in step S2 to further adjust the parameters of its encoder and decoder.
The steps S2-S4 realize fine adjustment of parameters of the pre-trained graph rolling network. Specifically, in the present embodiment, the first and second embodiments,
(1) The pre-trained encoder and decoder are fixed first, and the subsequent full-connection layer of the decoder is changed into a multi-layer perceptron (i.e. MLP) which sets (16,32,9) the feature dimension variation for each point. Partitioning data sets from target semanticsConstructing a point cloud->(as in the first construction mode in the previous pre-training step)Sample, change only dataset), point cloud is passed through the network output feature of fig. 1 +.>Its dimension is [65536, 9]。
(2) At the same time toAnd respectively constructing a neighborhood by using k-NN for each point, wherein the number k of the neighborhood points is set to be 16. The characteristic calculation mode of one point is as follows:
wherein ,mean value of neighborhood coordinate channel representing the point, +.>Mean value of neighborhood color channel representing the point, < +.>Representing the variance of the neighborhood color channel for that point. i 1,2,3, the self-learning procedure is set up->And->Each feature channel corresponds to a feature distance calculated, and n represents an index of k neighboring points of each point. The characteristics that give this point are expressed as:
thenConstructed->All points of (2) are characterized by->Its dimension is [65536, 9]。
(3) Calculation of and />Is trained as a function of the loss of the network of fig. 1. The feature associated with the coordinates is calculated as the euclidean distance and the feature distance associated with the color is calculated as the manhattan distance. Coordinate distance loss functionThe method comprises the following steps:
color distance loss functionThe method comprises the following steps:
finally, the loss function is:
wherein ,for the original point set->Index of each point in ∈ -> and />Is two super parameters, set to 1/3 and 2/3.
Training was optimized using random gradient descent, fixed training 30 rounds. The pre-training is used to fine tune the parameters of the encoder and decoder to adapt it toIs encoded by the data set of (a).
(III) generating pseudo tags and semantic segmentation by using semi-supervised learning network (realized by the following steps S5 and S6)
S5, gathering the original pointsAs target semantic segmentation dataset +.>Which contains a small amount of tagged data and a large amount of untagged data, wherein the amount of tagged data occupies the original point set +.>The ratio of the label data is 1% -10%, and then pseudo labels are distributed to the label-free data in the semi-supervised learning network by using the label data;
s6, distributing the pseudo tag in the step S5For network reasoning, semantics divide and predict the category of each point.
The step S5 specifically comprises the following steps:
s51, gathering the original pointsThe target semantic segmentation dataset is +.>Then->Is a group comprisingThe point of each point is set with the original point set +.>The point set with tag data in is +.>The number of points is +.>The point set of the unlabeled data is +.>The number of points is +.>There is-> and />
S52, using the encoder and decoder trained and adjusted in the step S4, outputting in the step S4The MLP of dimension is replaced by output +.>Dimension MLP, and will output +.>The dimension vector is denoted as->
S53, willThe feature corresponding to the point containing the tag is expressed as +.>The feature corresponding to the unlabeled point is expressed asThe method comprises the steps of carrying out a first treatment on the surface of the Then->
wherein , and />All are->Vectors of dimensions and using indices for distinguishing between different points, and +.>Comprises the following componentsCategory 0 </o>≤/>,/>Is->The actual category number needed to be semantically divided;
s54, selecting from the known tagged dataBelongs to the category ofAnd calculating the feature average of these points to obtain the class average feature vector +.>
wherein ,the expression category is +.>The number of points, +.>Representation->The category of the corresponding point is->Then, input +.>Calculating an average eigenvector +.>, wherein />The method comprises the steps of carrying out a first treatment on the surface of the For the followingIn the remaining non-existent categories,/->Marking as zero vector;
s55, calculating the point of the label-free dataFeature vector +.>And->Similarity matrix->
wherein ,the euclidean distance of the average feature vector representing the category to the vector corresponding to the unlabeled point,the superscript of (1) indicates category, ">The subscript of (2) indicates a certain point, and +.>,/>The base representing the natural logarithm, the index in brackets,/-for it>Is +.>
S56, the feature vector in the step S53Mapping to vector +.>As a result of the prediction,is>
For the followingLabeled points, class prediction is directly realized by using a Softmax classifier and a cross entropy loss function, and the loss function calculated by the points is +.>
For the followingThe unlabeled dots are first generated into pseudo-labels and then used for sum +.>In contrast, if points with low confidence in the pseudo tag are used, larger errors will be generated in the segmentation result. Thus, the similarity matrix may be selected on a category-by-category basisHighest confidence +.>And selecting the category with highest confidence level for the selected points point by point.
Specific: selecting similarity matrices category by categoryHighest confidence +.>Point, assume co-selectionIndividual points(/>≤/>≤/>) Selecting the category with highest confidence level for the selected points point by point, and updatingMaximum confidence of pseudo tags of the points and corresponding tag values;
will beThe predictive loss function of each unlabeled dot is designed to:
wherein the subscriptRepresentation->Any one of the points, s represents +.>Index of individual unlabeled dots, +.>Is->The number of categories contained in the list, m represents the index of the number of categories,/->Probability value representing final predictive label, +.>Representing a pseudo tag category, and when the pseudo tag category and the predicted category are the same,mtaking 1, otherwise taking 0,>indicating when the point is +.>Taking 1 when in, otherwise taking 0;
s57, loss function of whole graph convolution networkThe method comprises the following steps:
wherein the weight isThe method comprises the following steps: />
wherein ,epochrepresenting the current training round of the present time,max-epochrepresenting the maximum training round, at the beginningLess weight is used.
The step S6 specifically includes:
iterating through the process training network in which pseudo tags are assigned in steps S51-S57 until a target data set is reachedAnd (3) performing upper convergence, and removing a similarity matrix by using the trained network in the final prediction process>Is (are) calculated for->All points in (1) were using a Softmax classifier and then read +.>The rest of the data set is iterated to realize the goal data set +.>Semantic segmentation and class prediction of all points.
Specifically, the present embodiment modifies the subsequent layer of the decoder in fig. 1 into two MLPs, the feature dimension change of the first MLP for each point is set to (16,32,32), and the features are outputIts dimension is [65536, 32]The method comprises the steps of carrying out a first treatment on the surface of the The second MLP is set to (32,32,8) for each point feature dimension variation, outputting the feature ∈ ->Its dimension is [65536, 8]The modified network architecture is shown in fig. 2.
Target semantic segmentation datasetIt is necessary to contain a small amount of tagged data and a large amount of untagged data, and pseudo tags are assigned to the untagged data using the tagged data in step S5. Partitioning data sets from target semanticsConstructing a point cloud->(the construction mode is the same as that in the previous self-learning pre-training task), and the ++is constructed each time during semi-supervised training>1% -10% of the points are required to be provided with marking information. For example, in one training round, 65536 points contain 4096 labeling points, and the labeling information accounts for 6.25% in total of 5 categories. Use->Calculating the characteristic average value of the labeling points corresponding to each category to obtain the category average characteristic vector +.>
wherein ,representation->Some marked point is +.>Corresponding features of the category +.>. Then, for 4096 points of the input +.>Average feature vector calculated for each category->, wherein />. For->In the remaining non-existent 3 categories, +.>Recorded as zero vector.
Next, unlabeled dots are calculatedFeature vector +.>And->Similarity matrix->
wherein ,,/>is +.>
Then, the feature vector isMapping to vector +.>As a result of the prediction, < > for>Is>. Wherein 4096 labeled points directly implement class prediction using a Softmax classifier and a cross entropy loss function. But for->There are no tagged points and a pseudo tag needs to be generated for and + ->And (5) comparing. First selecting similarity matrix category by category>Highest confidence +.>And selecting the category with highest confidence level for the selected points point by point. Assume common selection->Dots (/ -)>≤/>≤/>) Then update this->Maximum confidence of pseudo tags of individual points and corresponding tag values. Wherein (1)>Set to 50% of the number of each category. Training runs were set to 100 and Adam was selected as the optimization method.
Finally, the trained network is utilized to remove the similarity matrixI.e. remove +.>To simi and simi to ∈>Is calculated by the computer. Then, for one time input to the network at the time of test +.>In (1) using a Softmax classifier, iterative construction +.>Until reading->And predicting tag values for all the points to realize semantic segmentation.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (4)

1. The city scene semantic segmentation method based on the graph convolution and the semi-supervised learning network is characterized by comprising the following steps of: the method comprises the following steps:
s1, pre-training a graph convolution network by using a public and marked urban road data set to obtain initialization parameters of each layer in the graph convolution network;
s2, inputting an original point set into the initialized graph rolling network at one time,/>The points in (a) contain only the coordinate xyz and the color rgb information, and the feature vector +.>
The step S2 specifically comprises the following steps:
s21, using the encoder of the graph rolling network to perform the initial point setCoding to obtain coding feature->The method comprises the steps of carrying out a first treatment on the surface of the For the original point set entered once into the graph convolution network +.>Wherein the dots contain only 6-dimensional features of the coordinates xyz and the colors rgb, denoted as feature +.>Will->Through an MLP mapping to 16 dimensions, the output is characterized by +.>And is used as the input of the first graph rolling module, the output of the former graph rolling module is the input of the next graph rolling module, and the characteristic is output after passing through the four graph rolling modules ∈ ->I.e. the final coding feature +.>
S22, reusing the decoder pair coding features of the graph rolling networkDecoding to obtain decoding characteristic->
S23, decoding the characteristics through MLPThe mapping output is the feature vector +.>The method comprises the steps of carrying out a first treatment on the surface of the Wherein the feature vector->The dimension of each point in (a) is expressed as +.>,/> and />Representing the coded coordinates and color features respectively, wherein the subscript of r represents a feature channel, 1 in the superscript of r represents a mean value, and 2 represents a variance;
s3, the original point set in the step S2Using k-NN to find k adjacent points to form neighborhood, calculating feature vector according to neighborhood of each point>
S4, calculating a feature vector and />As a loss function for adjusting the parameters of the graph roll-up network in step S2;
the step S4 specifically includes:
let us say that the original set of points entered into the graph rolling network in step S2Contains->The coordinate distance is calculated as the Euclidean distance, and the color distance is calculated as the Manhattan distance;
loss function of coordinate distanceThe method comprises the following steps: />Loss function of color distanceThe method comprises the following steps: />Finally, the loss function is:
wherein alpha is the original point set +.>Index of each point in ∈ -> and />Is two superparameters +.in the graph roll-up network> and />Are respectively set to 1/3 and 2/3; training the graph convolution network in the step S2 by using the loss function, and further adjusting parameters of an encoder and a decoder of the graph convolution network;
s5, gathering the original pointsAs target semantic segmentation dataset +.>It contains labeled data and unlabeled data, wherein the data volume of the labeled data occupies the original point set +.>The ratio of the label data is 1% -10%, and then pseudo labels are distributed to the label-free data in the semi-supervised learning network by using the label data;
s6, distributing the pseudo tag in the step S5For network reasoning, semantics divide and predict the category of each point.
2. The urban scene semantic segmentation method based on graph convolution and semi-supervised learning network as set forth in claim 1, wherein: the step S3 specifically comprises the following steps:
for the original point set in step S2Using k-NN to find k adjacent points to form neighborhood, and calculating feature vector according to neighborhood of each point>Wherein the dimension of each point is expressed as:
the calculation process is as follows: wherein ,/>Mean value of neighborhood coordinate channel representing each point, +.>Mean value of neighborhood color channels representing each point, < >>Representing the variance of the neighborhood color channel for each point,/->Taking 1,2,3, and setting +.>And->Each characteristic channel corresponds, n representing the index of k neighboring points for each point.
3. The urban scene semantic segmentation method based on graph convolution and semi-supervised learning network as set forth in claim 2, wherein: the step S5 specifically comprises the following steps:
s51, gathering the original pointsThe target semantic segmentation dataset is +.>Then->Is a group containing->The point of each point is set with the original point set +.>With tag data thereinThe point set of (2) is->The number of points is +.>The point set of the unlabeled data is +.>The number of points is +.>There is-> and />
S52, using the encoder and decoder trained and adjusted in the step S4, outputting in the step S4The MLP of dimension is replaced by output +.>Dimension MLP, and will output +.>The dimension vector is denoted as->
S53, willThe feature corresponding to the point containing the tag is expressed as +.>The feature corresponding to the unlabeled dot is denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the Then
wherein , and />All are->Vectors of dimensions and using indices for distinguishing between different points, and +.>The composition contains->Category 0 </o>≤/>,/>Is->The actual category number needed to be semantically divided;
s54, selecting the data belonging to the category from the known tagged dataAnd calculating the feature average of these points to obtain the class average feature vector +.>:/> wherein ,/>The expression category is +.>The number of points, +.>Representation->The category of the corresponding point is->Then, input +.>In the individual dot +.>Average feature vector calculated for each category->, wherein />The method comprises the steps of carrying out a first treatment on the surface of the For->In the remaining non-existent categories,/->Marking as zero vector;
s55, calculating the point of the label-free dataFeature vector +.>And->Similarity matrix->
wherein ,/>Euclidean distance of average feature vector representing category and vector corresponding to unlabeled point, ++>The superscript of (1) indicates category, ">Subscript +.>Representation->Any one of the points, and +.>,/>The base representing the natural logarithm, the index in brackets,/-for it>Is of the dimension of
S56, the feature vector in the step S53Mapping to vector +.>As a result of the prediction, < > for>Is>
For the followingLabeled points, class prediction is directly realized by using a Softmax classifier and a cross entropy loss function, and the loss function calculated by the points is +.>
For the followingThe unlabeled dots are first generated into pseudo-labels and then used for sum +.>Comparison, specific: first selecting similarity matrix category by category>Highest confidence +.>Dots, assume common selection->Point(s) of (E)>≤/>≤/>Then selecting the category with highest confidence level for the selected points point by point, and updating the category>Maximum confidence of pseudo tags of the points and corresponding tag values;
will beThe predictive loss function of each unlabeled dot is designed to:
wherein the subscript->Representation->Any one of the points, s represents +.>Index of individual unlabeled dots, +.>Is->The number of categories contained in the table, m represents an index of the number of categories,representing the mostProbability value of the final predictive label, +.>Representing a pseudo tag category, and when the pseudo tag category and the predicted category are the same,mtaking 1, otherwise taking 0,>indicating when the point is +.>Taking 1 when in, otherwise taking 0;
s57, loss function of whole graph convolution networkThe method comprises the following steps:
wherein the weight->The method comprises the following steps:
wherein ,epochrepresenting the current training round of the present time,max-epochrepresents the maximum training round, initially +.>Less weight is used.
4. The urban scene semantic segmentation method based on graph convolution and semi-supervised learning network as set forth in claim 3, wherein: the step S6 specifically includes:
iterating through the process training network in which pseudo tags are assigned in steps S51-S57 until a target data set is reachedAnd (3) performing upper convergence, and removing a similarity matrix by using the trained network in the final prediction process>Is (are) calculated for->All points in (1) were using a Softmax classifier and then read +.>The rest points in the data set are iterated, namely the target data set is realizedSemantic segmentation and class prediction of all points.
CN202310596881.7A 2023-05-25 2023-05-25 Urban scene semantic segmentation method based on graph convolution and semi-supervised learning network Active CN116310350B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310596881.7A CN116310350B (en) 2023-05-25 2023-05-25 Urban scene semantic segmentation method based on graph convolution and semi-supervised learning network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310596881.7A CN116310350B (en) 2023-05-25 2023-05-25 Urban scene semantic segmentation method based on graph convolution and semi-supervised learning network

Publications (2)

Publication Number Publication Date
CN116310350A CN116310350A (en) 2023-06-23
CN116310350B true CN116310350B (en) 2023-08-18

Family

ID=86785552

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310596881.7A Active CN116310350B (en) 2023-05-25 2023-05-25 Urban scene semantic segmentation method based on graph convolution and semi-supervised learning network

Country Status (1)

Country Link
CN (1) CN116310350B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116863432B (en) * 2023-09-04 2023-12-22 之江实验室 Weak supervision laser travelable region prediction method and system based on deep learning
CN117576217B (en) * 2024-01-12 2024-03-26 电子科技大学 Object pose estimation method based on single-instance image reconstruction

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070779A (en) * 2020-08-04 2020-12-11 武汉大学 Remote sensing image road segmentation method based on convolutional neural network weak supervised learning
CN112785611A (en) * 2021-01-29 2021-05-11 昆明理工大学 3D point cloud weak supervision semantic segmentation method and system
CN112861722A (en) * 2021-02-09 2021-05-28 中国科学院地理科学与资源研究所 Remote sensing land utilization semantic segmentation method based on semi-supervised depth map convolution
CN113936217A (en) * 2021-10-25 2022-01-14 华中师范大学 Priori semantic knowledge guided high-resolution remote sensing image weakly supervised building change detection method
CN114187446A (en) * 2021-12-09 2022-03-15 厦门大学 Cross-scene contrast learning weak supervision point cloud semantic segmentation method
US11450008B1 (en) * 2020-02-27 2022-09-20 Amazon Technologies, Inc. Segmentation using attention-weighted loss and discriminative feature learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113570610B (en) * 2021-07-26 2022-05-13 北京百度网讯科技有限公司 Method and device for performing target segmentation on video by adopting semantic segmentation model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11450008B1 (en) * 2020-02-27 2022-09-20 Amazon Technologies, Inc. Segmentation using attention-weighted loss and discriminative feature learning
CN112070779A (en) * 2020-08-04 2020-12-11 武汉大学 Remote sensing image road segmentation method based on convolutional neural network weak supervised learning
CN112785611A (en) * 2021-01-29 2021-05-11 昆明理工大学 3D point cloud weak supervision semantic segmentation method and system
CN112861722A (en) * 2021-02-09 2021-05-28 中国科学院地理科学与资源研究所 Remote sensing land utilization semantic segmentation method based on semi-supervised depth map convolution
CN113936217A (en) * 2021-10-25 2022-01-14 华中师范大学 Priori semantic knowledge guided high-resolution remote sensing image weakly supervised building change detection method
CN114187446A (en) * 2021-12-09 2022-03-15 厦门大学 Cross-scene contrast learning weak supervision point cloud semantic segmentation method

Also Published As

Publication number Publication date
CN116310350A (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN116310350B (en) Urban scene semantic segmentation method based on graph convolution and semi-supervised learning network
CN111612066B (en) Remote sensing image classification method based on depth fusion convolutional neural network
CN110796168A (en) Improved YOLOv 3-based vehicle detection method
CN112507793A (en) Ultra-short-term photovoltaic power prediction method
CN112149547B (en) Remote sensing image water body identification method based on image pyramid guidance and pixel pair matching
CN112541355B (en) Entity boundary type decoupling few-sample named entity recognition method and system
CN111914611B (en) Urban green space high-resolution remote sensing monitoring method and system
CN113487066A (en) Long-time-sequence freight volume prediction method based on multi-attribute enhanced graph convolution-Informer model
CN113449594A (en) Multilayer network combined remote sensing image ground semantic segmentation and area calculation method
CN113256649B (en) Remote sensing image station selection and line selection semantic segmentation method based on deep learning
CN115482491B (en) Bridge defect identification method and system based on transformer
CN111967325A (en) Unsupervised cross-domain pedestrian re-identification method based on incremental optimization
CN112712052A (en) Method for detecting and identifying weak target in airport panoramic video
CN114299286A (en) Road scene semantic segmentation method based on category grouping in abnormal weather
Tian et al. Semantic segmentation of remote sensing image based on GAN and FCN network model
CN117237660A (en) Point cloud data processing and segmentation method based on deep learning feature aggregation
CN111368843B (en) Method for extracting lake on ice based on semantic segmentation
CN117011701A (en) Remote sensing image feature extraction method for hierarchical feature autonomous learning
CN116524197A (en) Point cloud segmentation method, device and equipment combining edge points and depth network
CN115965867A (en) Remote sensing image earth surface coverage classification method based on pseudo label and category dictionary learning
CN114694019A (en) Remote sensing image building migration extraction method based on anomaly detection
CN116071661B (en) Urban road scene semantic segmentation method based on laser point cloud
CN116452794B (en) Directed target detection method based on semi-supervised learning
CN113421269B (en) Real-time semantic segmentation method based on double-branch deep convolutional neural network
CN117152427A (en) Remote sensing image semantic segmentation method and system based on diffusion model and knowledge distillation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant