CN116310350A

CN116310350A - Urban scene semantic segmentation method based on graph convolution and semi-supervised learning network

Info

Publication number: CN116310350A
Application number: CN202310596881.7A
Authority: CN
Inventors: 王程; 陈钧; 陈一平
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2023-05-25
Filing date: 2023-05-25
Publication date: 2023-06-23
Anticipated expiration: 2043-05-25
Also published as: CN116310350B

Abstract

The invention discloses a city scene semantic segmentation method based on a graph convolution and a semi-supervised learning network, which comprises the following steps: s1, a network is rolled through a pre-training chart to obtain initialization parameters; s2, inputting an original point set once

Outputting the feature vector

The method comprises the steps of carrying out a first treatment on the surface of the S3, for the original point set

Computing feature vectors from the neighborhood of each point

The method comprises the steps of carrying out a first treatment on the surface of the S4, calculating a feature vector

And

as a loss function to adjust parameters of the graph rolling network; s5, using the labeled data to distribute pseudo labels for the unlabeled data; s6, distributing the pseudo tag

Semantic segmentation is performed and the class of each point is predicted. The method can realize the semantic segmentation of the urban road scene only by a small amount of marked data.

Description

Urban scene semantic segmentation method based on graph convolution and semi-supervised learning network

Technical Field

The invention relates to the field of computer graphics, in particular to a city scene semantic segmentation method based on graph convolution and a semi-supervised learning network.

Background

The point cloud is used as unstructured three-dimensional data, compared with data formats such as voxels and grids, the point cloud characterizes objects more accurately, flexibly and variously, and has wide application in the field of smart cities. For example, in planning urban construction, a digital traffic map is generated by using point clouds, and planning of traffic lines, urban construction and the like is assisted, so that planning efficiency and precision are improved; in environment monitoring and analysis, the point cloud data can be utilized to carry out three-dimensional modeling on an actual scene, so that the three-dimensional modeling method is used for analyzing conditions such as landforms, hydrogeology, building damage and the like, and is convenient for city management and maintenance.

In practical smart city applications, the point cloud can be generally divided into the following five steps from acquisition to application: (1) point cloud acquisition; (2) point cloud preprocessing; (3) extracting point cloud characteristics; (4) point cloud semantic segmentation; (5) downstream model deployment and application.

One difficulty with the above steps is that feature extraction and semantic segmentation require a large amount of annotation data for model training. In terms of feature extraction and semantic segmentation, the traditional method adopts a manual design feature descriptor to extract features or adopts a deep learning method to automatically extract features by using a neural network, so that the semantic segmentation is respectively realized. However, the training process of these methods is usually supervised learning, i.e. a large amount of labeling data is required for model training. However, the point cloud of the urban scene is huge in scale, and if all points are manually marked, the process is cumbersome and expensive.

Disclosure of Invention

The invention aims to overcome the difficulty that a large amount of annotation data is needed in an urban scene semantic segmentation algorithm, and provides an urban scene semantic segmentation method based on a graph convolution and a semi-supervised learning network.

The city scene semantic segmentation method based on the graph convolution and the semi-supervised learning network comprises the following steps:

s1, pre-training a graph convolution network by using a public and marked urban road data set to obtain initialization parameters of each layer in the graph convolution network;

s2, inputting an original point set into the initialized graph rolling network at one time

，/>

The points in (a) contain only the coordinate xyz and the color rgb information, and the feature vector +.>

；

S3, the original point set in the step S2

Using k-NN to find k adjacent points to form neighborhood, calculating feature vector according to neighborhood of each point>

；

S4, calculating a feature vector

and />

As a loss function for adjusting the parameters of the graph roll-up network in step S2;

s5, gathering the original points

As target semantic segmentation dataset +.>

It contains labeled data and unlabeled data, wherein the data volume of the labeled data occupies the original point set +.>

The ratio of the label data is 1% -10%, and then pseudo labels are distributed to the label-free data in the semi-supervised learning network by using the label data;

s6, distributing the pseudo tag in the step S5

For network reasoning, semantics divide and predict the category of each point.

Further, the step S2 specifically includes:

s21, using the encoder of the graph rolling network to perform the initial point set

Coding to obtain coding feature->

；

S22, reusing the decoder pair coding features of the graph rolling network

Decoding to obtain decoding characteristic->

；

S23, decoding the characteristics through MLP

The mapping output is the feature vector +.>

The method comprises the steps of carrying out a first treatment on the surface of the Wherein the characteristic directionQuantity->

The dimension of each point in (a) is expressed as +.>

，/>

and />

Representing the coded coordinates and color features, respectively, the subscript of r represents the feature channel, 1 in the superscript of r represents the mean, and 2 represents the variance.

Further, the step S3 specifically includes:

for the original point set in step S2

Using k-NN to find k adjacent points to form neighborhood, and calculating feature vector according to neighborhood of each point>

Wherein the dimension of each point is expressed as:

；

the calculation process is as follows:

wherein ,

mean value of neighborhood coordinate channel representing each point, +.>

Mean value of neighborhood color channels representing each point, < >>

Representing the variance of the neighborhood color channel for each point,/->

Taking 1,2,3, the self-learning process is set with +.>

And->

Each feature channel corresponds to a feature distance calculated, and n represents an index of k neighboring points of each point.

Further, the step S4 specifically includes:

let us say that the original set of points entered into the graph rolling network in step S2

Contains->

The coordinate distance is calculated as the Euclidean distance, and the color distance is calculated as the Manhattan distance;

loss function of coordinate distance

The method comprises the following steps:

loss function of color distance

The method comprises the following steps:

finally, the loss function is:

wherein ,

for the original point set->

Index of each point in ∈ ->

and />

Is two superparameters in the graph rolling network

and />

Are respectively set to 1/3 and 2/3; the loss function is used to train the graph convolution network in step S2 to further adjust the parameters of its encoder and decoder.

Further, the step S5 specifically includes:

s51, gathering the original points

The target semantic segmentation dataset is +.>

Then->

Is a group comprising

The point of each point is set with the original point set +.>

The point set with tag data in is +.>

The number of points is +.>

The point set of the unlabeled data is +.>

The number of points is +.>

There is->

and />

；

S52, using the encoder and decoder trained and adjusted in the step S4, outputting in the step S4

The MLP of dimension is replaced by output +.>

Dimension MLP, and will output +.>

The dimension vector is denoted as->

；

S53, will

The feature corresponding to the point containing the tag is expressed as +.>

The feature corresponding to the unlabeled point is expressed as

The method comprises the steps of carrying out a first treatment on the surface of the Then->

；

wherein ,

and />

All are->

Vectors of dimensions and using indices for distinguishing between different points, and +.>

Comprises the following components

Category 0 </o>

≤/>

，/>

Is->

The actual category number needed to be semantically divided;

s54, selecting the data belonging to the category from the known tagged data

And calculating the feature average of these points to obtain the class average feature vector +.>

：

wherein ,

the expression category is +.>

The number of points, +.>

Representation->

The category of the corresponding point is->

Then, input +.>

Calculating an average eigenvector +.>

, wherein />

The method comprises the steps of carrying out a first treatment on the surface of the For the following

In the remaining non-existent categories,/->

Marking as zero vector;

s55, calculating the point of the label-free data

Feature vector +.>

And->

Similarity matrix->

：

wherein ,

the euclidean distance of the average feature vector representing the category to the vector corresponding to the unlabeled point,

the superscript of (1) indicates category, ">

The subscript of (2) indicates a certain point, and +.>

，/>

The base representing the natural logarithm, the index in brackets,/-for it>

Is +.>

；

S56, the feature vector in the step S53

Mapping to vector +.>

As a result of the prediction,

is>

；

For the following

Labeled points, class prediction is directly realized by using a Softmax classifier and a cross entropy loss function, and the loss function calculated by the points is +.>

；

For the following

The unlabeled dots are first generated into pseudo-labels and then used for sum +.>

Comparison, specific: first selecting similarity matrix category by category>

Highest confidence +.>

Dots, assume common selection->

Dots (/ -)>

≤/>

≤/>

) Then selecting the category with highest confidence level for the selected points point by point, and updating the category>

Maximum confidence of pseudo tags of the points and corresponding tag values;

will be

The predictive loss function of each unlabeled dot is designed to:

wherein the subscript

Representation->

Any one of the points, s represents +.>

Index of individual unlabeled dots, +.>

Is->

The number of categories contained in the list, m represents the index of the number of categories,/->

Probability value representing final predictive label, +.>

Representing a pseudo tag category, and when the pseudo tag category and the predicted category are the same,mtaking 1, otherwise taking 0,>

indicating when the point is +.>

Taking 1 when in, otherwise taking 0;

s57, loss function of whole graph convolution network

The method comprises the following steps:

wherein the weight is

The method comprises the following steps:

wherein ,epochrepresenting the current training round of the present time,max-epochrepresenting the maximum training round, at the beginning

Less weight is used.

Further, the step S6 specifically includes:

iterating through the process training network in which pseudo tags are assigned in steps S51-S57 until a target data set is reached

And (3) performing upper convergence, and removing a similarity matrix by using the trained network in the final prediction process>

Is (are) calculated for->

All points in (1) were using a Softmax classifier and then read +.>

The rest of the data set is iterated to realize the goal data set +.>

Semantic segmentation and class prediction of all points.

After the technical scheme is adopted, compared with the background technology, the invention has the following advantages:

1. the invention adopts the idea of transfer learning, fully utilizes the similar characteristics of different urban scenes, utilizes the disclosed labeling data set to obtain the initialization parameters of the graph rolling network, and is beneficial to improving the stability of the neural network in the representation of different data sets;

2. the invention adopts a self-learning pre-training task, fully utilizes the local and color characteristics of objects in urban scenes, does not need to use labeling data, and can learn the priori distribution of the data to finely adjust network parameters;

3. the invention uses semi-supervised learning to reduce the dependence on the labeling data, so that the high-quality pseudo labels can be generated by using a small amount of labeled data, the effect of semi-supervised learning is improved, the semantic segmentation of the target data set is realized, and the dependence on the manual labeling data is greatly reduced.

Drawings

FIG. 1 is a flow chart of the pre-training graph rolling network fine tuning parameters of the present invention;

fig. 2 is a flowchart of a training process for generating a pseudo tag by the semi-supervised learning network of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Examples

the initial parameters are obtained by the pre-training graph rolling network (realized by the following step S1)

S1, pre-training a graph convolution network by using a public and labeled data set to obtain initialization parameters of each layer in the graph convolution network;

the encoder of the graph rolling network consists of a multi-layer perceptron layer and four graph rolling modules, wherein the four graph rolling modules are numbered as 1,2,3 and 4 in sequence; the input features of the graph convolution module are expressed as

The output characteristic is->

, wherein />

I.e. the output of the previous convolution module is the input of the next convolution module and the current module input point is +.>

Input feature dimension is->

The characteristic dimension after convolution is +.>

The number of points is reduced to +.>

。

The embodiment adopts public and marked urban road data as a pre-training data set for pre-training. The Toronto3D of the public data set acquires a 1km high-quality city market scenery spot cloud by using a vehicle-mounted laser radar, and the data set manually marks more than eight millions of spots, so that 8 common city scene categories are covered: highways, zebra crossings, buildings, power lines, power towers, automobiles, and fences, and all contain coordinate and color information. Selecting a point cloud in Toronto3D

As input to the pre-trained graph convolution network, in this embodiment +.>

If the number of points (1) is 65536 +.>

Is [65536, 6 ]]。

First, an encoder is used to obtain encoding features

Wherein MLP (i.e. multi-layer perceptron) is used to determine [65536, 6]Mapping to [65536,16 ]]Then inputting the extracted features into four graph convolution modules; then the decoding characteristics are obtained through a decoder>

Its dimension is [65536,16 ]]The method comprises the steps of carrying out a first treatment on the surface of the Then, the pair ++is implemented using fully connected network and Softmax classifier>

Class prediction for each point in the network, the characteristics of the network for each pointThe symptomatic dimensional change is set to (16, 64, 64, 8); then, cross entropy is used as a loss function, random gradient descent is optimized, the network is pre-trained, and parameters of each layer of the network are updated; finally, the above process is repeated until the network converges. The convergence condition is set to be 100 rounds of fixed training, and if the prediction accuracy of 20 continuous rounds is not improved, the training can be stopped. Instead of random initialization, the network of the present embodiment employs pre-trained parameter initialization.

(II) performing self-learning training task to achieve parameter fine adjustment (achieved by the following steps S2-S4)

，/>

；

S3, the original point set in the step S2

；

S4, calculating a feature vector

and />

the step S2 specifically includes:

since step S1 has initialized the layer parameters of the graph rolling network, step S2 uses only the encoder and decoder portions thereof, modifies the fully connected network and Softmax classifier in step S1 to the MLP layer, and then performs the following steps.

Coding to obtain coding feature->

；

In particular, for an original set of points that are input into the graph rolling network at one time

Wherein the dots contain only 6-dimensional features of the coordinates xyz and the colors rgb, denoted as feature +.>

Will->

Mapping to 16 dimension by a multi-layer perceptron, the output is characterized by +>

And is used as the input of the first graph rolling module, the output of the former graph rolling module is the input of the next graph rolling module, and the characteristic is output after passing through the four graph rolling modules ∈ ->

I.e. the final coding feature +.>

。

S22, reusing the decoder pair coding features of the graph rolling network

Decoding to obtain decoding characteristic->

；

The coding features obtained in S21

Feature ∈The same-dimensional mapping using MLP>

And as decoder input, decoding characteristics +.>

Decoding after the up-sampling of the adjacent points and the jump connection of the MLP down-sum encoder to obtain the output characteristic +.>

. Wherein the decoding feature uses the subscript +.>

And superscript->

In distinction to the features of the encoder,

and sequentially taking 4,3,2 and 1. The encoder is jump-connected to the decoder, i.e. has coding features of the same dimension +.>

Decoding characteristics->

The added features are used as input features for subsequent layers. Decoding characteristics->

Namely decoding characteristic->

。

S23, decoding the characteristics through MLP

The mapping output is the feature vector +.>

The method comprises the steps of carrying out a first treatment on the surface of the Wherein the feature vector->

The dimension of each point in (a) is expressed as +.>

，/>

and />

The step S3 specifically comprises the following steps:

first, for the original point set in step S2

Using k-NN to find k adjacent points to form a neighborhood;

specifically, for the original point set input to the network

Each point of->

Finding a set of nearest neighbors using k-NN

Coordinate information is then embedded:

= LBR(/>

, />

, />

,/>

)

wherein the coordinate features

From the point->

Adjacent point->

Is obtained, in particular, by the absolute coordinates of the two points being linked +.>

and />

Offset->

And spatial distance->

The method comprises the steps of carrying out a first treatment on the surface of the Sign symbolLBRThen the connected feature vector is represented to sequentially pass through the Linear layer, the BatchNorm layer and the ReLU layer, and the +_in the graph rolling module>

Is mapped to the same dimension as the point set features it inputs.

Then, the dot

And adjacent point->

The relation of (2) is expressed as the side relation +.>

：

= R(g(/>

))

wherein ,

input +.>

Point set feature of the individual graph convolution +.>

And its coordinate feature->

Using a learnable weight after connectiongThe weighting is performed so that the weight of the sample,gcan be realized by using MLP, 1D-CNN and the like; />

Representing the ReLU layer. Finally, each point is aggregated channel by channel using Max-Pooling>

And uses random sampling to reduce the number of points to obtain output characteristics

。

Feature vectors are then calculated from the neighborhood of each point

Wherein the dimension of each point is expressed as:

；

the calculation process is as follows:

wherein ,

mean value of neighborhood coordinate channel representing each point, +.>

Mean value of neighborhood color channels representing each point, < >>

Representing the variance of the neighborhood color channel for each point,/->

Taking 1,2,3, the self-learning process is set with +.>

And->

Each feature channel corresponds to a feature distance calculated, and n represents an index of k neighboring points of each point. Because the urban street scene point clouds are distributed sparsely and unevenly, the neighborhood coordinate variance is larger, and therefore, the network only uses the coordinate mean value. The vegetation (green), pavement (gray) and other objects have obvious color characteristics, and the local color change is generally smooth, so that the color adopts two characteristics of mean and variance.

The step S4 specifically includes:

Contains->

loss function of coordinate distance

The method comprises the following steps:

loss function of color distance

The method comprises the following steps:

finally, the loss function is:

wherein ,

for the original point set->

Index of each point in ∈ ->

and />

Is two superparameters in the graph rolling network

and />

The steps S2-S4 realize fine adjustment of parameters of the pre-trained graph rolling network. Specifically, in the present embodiment, the first and second embodiments,

(1) The pre-trained encoder and decoder are fixed first, and the subsequent full-connection layer of the decoder is changed into a multi-layer perceptron (i.e. MLP) which sets (16,32,9) the feature dimension variation for each point. Partitioning data sets from target semantics

Constructing a point cloud->

(just change data set as in the previous pre-training construction), the point cloud is passed through the network output feature of FIG. 1 +.>

Its dimension is [65536, 9]。

(2) At the same time to

And respectively constructing a neighborhood by using k-NN for each point, wherein the number k of the neighborhood points is set to be 16. The characteristic calculation mode of one point is as follows:

wherein ,

mean value of neighborhood coordinate channel representing the point, +.>

Mean value of neighborhood color channels representing the point，/>

Representing the variance of the neighborhood color channel for that point. i 1,2,3, the self-learning procedure is set up->

And->

Each feature channel corresponds to a feature distance calculated, and n represents an index of k neighboring points of each point. The characteristics that give this point are expressed as:

then

Constructed->

All points of (2) are characterized by->

Its dimension is [65536, 9]。

(3) Calculation of

and />

Is trained as a function of the loss of the network of fig. 1. The feature associated with the coordinates is calculated as the euclidean distance and the feature distance associated with the color is calculated as the manhattan distance. Coordinate distance loss function

The method comprises the following steps:

color distance loss function

The method comprises the following steps:

finally, the loss function is:

wherein ,

for the original point set->

Index of each point in ∈ ->

and />

Is two super parameters, set to 1/3 and 2/3.

Training was optimized using random gradient descent, fixed training 30 rounds. The pre-training is used to fine tune the parameters of the encoder and decoder to adapt it to

Is encoded by the data set of (a).

(III) generating pseudo tags and semantic segmentation by using semi-supervised learning network (realized by the following steps S5 and S6)

S5, gathering the original points

As target semantic segmentation dataset +.>

Which contains a small amount of tagged data and a large amount of untagged data, wherein the amount of tagged data occupies the original point set +.>

s6, distributing the pseudo tag in the step S5

For network reasoning, semantics divide and predict the category of each point.

The step S5 specifically comprises the following steps:

s51, gathering the original points

The target semantic segmentation dataset is +.>

Then->

Is a group comprising

The point of each point is set with the original point set +.>

The point set with tag data in is +.>

The number of points is +.>

The point set of the unlabeled data is +.>

The number of points is +.>

There is->

and />

；

The MLP of dimension is replaced by output +.>

Dimension MLP, and will output +.>

The dimension vector is denoted as->

；

S53, will

The feature corresponding to the point containing the tag is expressed as +.>

The feature corresponding to the unlabeled point is expressed as

；

wherein ,

and />

All are->

Comprises the following components

The number of categories of the product,0＜/>

≤/>

，/>

is->

The actual category number needed to be semantically divided;

s54, selecting the data belonging to the category from the known tagged data

：

wherein ,

the expression category is +.>

The number of points, +.>

Representation->

The category of the corresponding point is->

Then, input +.>

Calculating an average eigenvector +.>

, wherein />

In the remaining non-existent categories,/->

Marking as zero vector;

s55, calculating the point of the label-free data

Feature vector +.>

And->

Similarity matrix->

：

wherein ,

the superscript of (1) indicates category, ">

The subscript of (2) indicates a certain point, and +.>

，/>

The base representing the natural logarithm, the index in brackets,/-for it>

Is +.>

；

S56, the feature vector in the step S53

Mapping to vector +.>

As a result of the prediction,

is>

；

For the following

；

For the following

In contrast, if points with low confidence in the pseudo tag are used, larger errors will be generated in the segmentation result. Thus, the similarity matrix may be selected on a category-by-category basis

Highest confidence +.>

And selecting the category with highest confidence level for the selected points point by point.

Specific: selecting similarity matrices category by category

Highest confidence +.>

Point, assume co-selection

Dots (/ -)>

≤/>

≤/>

) Selecting the category with highest confidence level for the selected points point by point, and updating

Maximum confidence of pseudo tags of the points and corresponding tag values;

will be

The predictive loss function of each unlabeled dot is designed to:

wherein the subscript

Representation->

Any one of the points, s represents +.>

Index of individual unlabeled dots, +.>

Is->

Probability value representing final predictive label, +.>

indicating when the point is +.>

Taking 1 when in, otherwise taking 0;

s57, loss function of whole graph convolution network

The method comprises the following steps: />

Wherein the weight is

The method comprises the following steps:

Less weight is used.

The step S6 specifically includes:

Is (are) calculated for->

All points in (1) were using a Softmax classifier and then read +.>

The rest of the data set is iterated to realize the goal data set +.>

Semantic segmentation and class prediction of all points.

Specifically, the present embodiment modifies the subsequent layer of the decoder in fig. 1 into two MLPs, the feature dimension change of the first MLP for each point is set to (16,32,32), and the features are output

Its dimension is [65536, 32]The method comprises the steps of carrying out a first treatment on the surface of the The second MLP is set to (32,32,8) for each point feature dimension variation, outputting the feature ∈ ->

Its dimension is [65536, 8]The modified network architecture is shown in fig. 2.

Target semantic segmentation dataset

It is necessary to contain a small amount of tagged data and a large amount of untagged data, and pseudo tags are assigned to the untagged data using the tagged data in step S5. Partitioning data sets from target semantics

Constructing a point cloud->

(the construction mode is the same as that in the previous self-learning pre-training task), and the ++is constructed each time during semi-supervised training>

1% -10% of the points are required to be provided with marking information. For example, in one training round, 65536 points contain 4096 labeling points, and the labeling information accounts for 6.25% in total of 5 categories. Use->

Calculating the characteristic average value of the labeling points corresponding to each category to obtain the category average characteristic vector +.>

：

wherein ,

representation->

Some marked point is +.>

Corresponding features of the category +.>

. Then, for 4096 points of the input +.>

Average feature vector calculated for each category->

, wherein />

. For->

In the remaining non-existent 3 categories, +.>

Recorded as zero vector.

Next, unlabeled dots are calculated

Feature vector +.>

And->

Similarity matrix->

：

wherein ,

，/>

is +.>

。

Then, the feature vector is

Mapping to vector +.>

As a result of the prediction, < > for>

Is>

. Wherein 4096 labeled points directly implement class prediction using a Softmax classifier and a cross entropy loss function. But for->

There are no tagged points and a pseudo tag needs to be generated for and + ->

And (5) comparing. First selecting similarity matrix category by category>

Highest confidence +.>

And selecting the category with highest confidence level for the selected points point by point. Assume common selection->

Dots (/ -)>

≤/>

≤/>

) Then update this->

Maximum confidence of pseudo tags of individual points and corresponding tag values. Wherein (1)>

Set to 50% of the number of each category. Training runs were set to 100 and Adam was selected as the optimization method.

Finally, the trained network is utilized to remove the similarity matrix

Is calculated by the calculation of (a),i.e. remove +.>

To simi and simi to ∈>

Is calculated by the computer. Then, for one time input to the network at the time of test +.>

In (1) using a Softmax classifier, iterative construction +.>

Until reading->

And predicting tag values for all the points to realize semantic segmentation.

The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. The city scene semantic segmentation method based on the graph convolution and the semi-supervised learning network is characterized by comprising the following steps of: the method comprises the following steps:

，/>

The points in (a) only contain the information of the coordinates xyz and the colors rgb, and the characteristic directions are output by using the graph rolling networkQuantity->

；

S3, the original point set in the step S2

；

S4, calculating a feature vector

and />

s5, gathering the original points

As target semantic segmentation dataset +.>

s6, distributing the pseudo tag in the step S5

For network reasoning, semantics divide and predict the category of each point.

2. The urban scene semantic segmentation method based on graph convolution and semi-supervised learning network as set forth in claim 1, wherein: the step S2 specifically comprises the following steps:

Coding to obtain coding feature->

；

S22, reusing the decoder pair coding features of the graph rolling network

Decoding to obtain decoding characteristic->

；

S23, decoding the characteristics through MLP

The mapping output is the feature vector +.>

The dimension of each point in (a) is expressed as +.>

，/>

and />

3. The urban scene semantic segmentation method based on graph convolution and semi-supervised learning network as set forth in claim 2, wherein: the step S3 specifically comprises the following steps:

for the original point set in step S2

Wherein the dimension of each point is expressed as: />

；

The calculation process is as follows:

wherein ,/>

Mean value of neighborhood coordinate channel representing each point, +.>

Mean value of neighborhood color channels representing each point, < >>

Representing the variance of the neighborhood color channel for each point,/->

Taking 1,2,3, the self-learning process is set with +.>

And->

4. The urban scene semantic segmentation method based on graph convolution and semi-supervised learning network as set forth in claim 3, wherein: the step S4 specifically includes:

Contains->

loss function of coordinate distance

The method comprises the following steps:

loss function of color distance->

The method comprises the following steps: />

Finally, the loss function is: />

wherein ,/>

For the original point set->

An index of each of the points in the (c),

and />

Is two superparameters +.in the graph roll-up network>

and />

5. The urban scene semantic segmentation method based on graph convolution and semi-supervised learning network as set forth in claim 4, wherein: the step S5 specifically comprises the following steps:

s51, gathering the original points

The target semantic segmentation dataset is +.>

Then->

Is a group containing->

The point of each point is set with the original point set +.>

The point set with tag data in is +.>

The number of points is +.>

The point set of the unlabeled data is +.>

The number of points is +.>

There is->

and />

；

The MLP of dimension is replaced by output +.>

Dimension MLP, and will output +.>

The dimension vector is denoted as->

；

S53, will

The feature corresponding to the point containing the tag is expressed as +.>

The feature corresponding to the unlabeled dot is denoted +.>

The method comprises the steps of carrying out a first treatment on the surface of the Then

；

wherein ,

and />

All are->

The composition contains->

Category 0 </o>

≤/>

，/>

Is->

The actual category number needed to be semantically divided;

s54, selecting the data belonging to the category from the known tagged data

：

wherein ,/>

The expression category is +.>

The number of points, +.>

Representation->

The category of the corresponding point is->

Then, input +.>

Calculating an average eigenvector +.>

, wherein

The method comprises the steps of carrying out a first treatment on the surface of the For->

In the remaining non-existent categories,/->

Marking as zero vector;

s55, calculating the point of the label-free data

Feature vector +.>

And->

Similarity matrix->

：

wherein ,/>

Euclidean distance of average feature vector representing category and vector corresponding to unlabeled point, ++>

The superscript of (1) indicates category, ">

The subscript of (2) indicates a certain point, and

，/>

the base representing the natural logarithm, the index in brackets,/-for it>

Is +.>

；

S56, the feature vector in the step S53

Mapping to vector +.>

As a result of the prediction, < > for>

Is>

；

For the following

；

For the following

Comparison, specific: first selecting similarity matrix category by category>

Highest confidence +.>

Dots, assume common selection->

Dots (/ -)>

≤/>

≤/>

Maximum confidence of pseudo tags of the points and corresponding tag values;

will be

The predictive loss function of each unlabeled dot is designed to:

wherein the subscript->

Representation->

Any one of the points, s represents +.>

Index of individual unlabeled dots, +.>

Is->

The number of categories contained in the table, m represents an index of the number of categories,

probability value representing final predictive label, +.>

indicating when the point is +.>

Taking 1 when in, otherwise taking 0;

s57, loss function of whole graph convolution network

The method comprises the following steps:

wherein the weight->

The method comprises the following steps: />

wherein ,epochrepresenting the current training round of the present time,max-epochrepresents the maximum training round, initially +.>

Less weight is used.

6. The urban scene semantic segmentation method based on graph convolution and semi-supervised learning network as set forth in claim 5, wherein: the step S6 specifically includes:

Is (are) calculated for->

All points in (1) were using a Softmax classifier and then read +.>

The rest of the data set is iterated to realize the goal data set +.>

Semantic segmentation and class prediction of all points.