WO2022121545A1

WO2022121545A1 - Graph convolutional network-based grid segmentation method

Info

Publication number: WO2022121545A1
Application number: PCT/CN2021/126910
Authority: WO
Inventors: 倪天宇; 郑友怡
Original assignee: 浙江大学
Priority date: 2020-12-10
Filing date: 2021-10-28
Publication date: 2022-06-16
Also published as: CN112634281A

Abstract

The present invention provides a graph convolutional network-based grid segmentation method. In the method of the present invention, the surfaces of a grid are taken as basic units, and a graph convolutional operation is performed on a dual graph formed on the basis of an adjacency relationship of the surfaces, so as to obtain a feature representation for the surfaces. In the present invention, in a feature obtaining stage, static and dynamic edge convolutions are utilized at the same time, and the capability of learning information from a potential relationship between the surfaces is also obtained while an actual adjacency relationship is utilized. In addition, in the present invention, a feature is further enhanced by utilizing the idea of feature embedding in instance segmentation, and finally, all parts of the grid are segmented by utilizing the enhanced feature. According to the present invention, a better result is obtained on data sets segmented at a plurality of parts.

Description

A Grid Segmentation Method Based on Graph Convolutional Networks

technical field

The invention belongs to the fields of computer graphics and computer vision, and in particular relates to a grid part segmentation method based on a graph convolution network.

Background technique

Semantic segmentation is one of the key issues in computer vision. With the development of deep learning, the use of neural networks for semantic segmentation in the field of two-dimensional images has been widely explored and studied. When this problem is extended to 3D meshes, image-based operations are often not directly applicable due to its irregularity. The previous methods often voxelize the 3D model or use multi-view 2D images to represent 3D objects, and then apply the methods in the 2D images. The former often increases the amount of computation due to the sparsity of the data, while the latter abandons the original structure of the three-dimensional object, and the amount of computation is still large. For the three-dimensional grid data, we transform it into the dual space with the face as the node, and use the graph convolutional neural network to learn the features based on the graph.

Early graph convolutional neural networks often required static graph structures, while recent research on dynamic graph convolutions shows that dynamic edges can achieve better results. Our method utilizes both static edge convolution and dynamic edge convolution for feature learning, taking advantage of the original geometric structure while also considering potential similarity relationships.

In the field of instance segmentation, feature embedding is a more commonly used method. The main idea is to obtain a representation that is close to the same category but far away from different categories, and then uses this representation to obtain the final instance segmentation. Our method also references this idea and utilizes the representations obtained from feature embeddings for final part segmentation.

SUMMARY OF THE INVENTION

The invention proposes a grid segmentation method based on GCN network, which forms a graph representation of grids according to the adjacency relationship of surfaces, so as to realize effective feature learning through graph convolution and feature embedding. At the same time, the graph convolution used in the present invention uses both static edge convolution and dynamic edge convolution, and also considers the relationship between the original geometric structure and the feature space. And the present invention uses the method of feature embedding to constrain the distribution of features in the feature space.

The present invention is achieved through the following technical solutions:

A grid segmentation method based on graph convolutional network, including the following steps:

Step 1: Transform the mesh model to the specified number of patches and standardize it.

Step 2: Convert the model processed in Step 1 into a graph representation, and perform preliminary feature extraction on each face and input it into the trained graph convolutional neural network to predict the type of part that each face in the grid belongs to. . Wherein, the graph convolutional neural network includes:

The transformation module is used to make the orientation of the input preliminary features similar.

The graph convolution module is used to learn the features related to the adjacent faces in the real space and the adjacent faces in the feature space according to the transformed preliminary features.

The feature embedding module is used to obtain features of the same class with similar distances and different classes according to the features obtained by the graph convolution module.

The output module is used to obtain predicted segmentation results based on the features learned by the graph convolutional layer and the results of feature embedding.

Further, the step 1 is realized by the following sub-steps:

(1.1) For the input model, simplify or subdivide it to the specified number of patches.

(1.2) For the transformed model, perform translation and scaling operations on it, so that the mean value of all vertices in the model is 0, and the maximum distance from the origin is 1.

Further, the transformation module is composed of a static convolutional layer, a maximum pooling layer and several fully connected layers, and the one static convolutional layer, a maximum pooling layer and several fully connected layers are used for Predict a rotation matrix and transform the input preliminary features by the rotation matrix.

Further, the graph convolution module includes a static convolution layer, a dynamic convolution layer, a fully connected layer and a pooling layer, wherein the features learned by each layer of the static convolution layer and the dynamic convolution layer are connected and input to the full layer. The connection layer summarizes and obtains the overall features through the pooling layer.

Further, the feature embedding module is composed of fully connected layers.

Further, the feature embedding module uses three loss functions to constrain it during training, L _var constrains similar features of the same type, L _dist constrains features of different categories to be farther, and L _reg constrains the range of feature embedding.

Further, in the present invention, both the static convolution layer and the dynamic convolution layer adopt an edge-conditional convolution (Edge-Conditioned Convolution) structure.

The beneficial effects of the present invention are:

The invention proposes a grid segmentation method based on a graph convolutional neural network. Different from previous learning-based mesh segmentation methods that perform feature learning based on multi-view images or voxel-based representations, the present invention utilizes the structure of the triangular mesh itself, and then introduces graph convolution operations in the face-based graph representation. , and further represented by the method of feature embedding. The present invention exploits the natural structure of grids for representation and is lightweight for both training and inference phases. In graph convolution, the present invention uses both static convolution and dynamic convolution to learn information from the original grid structure and similarity in feature space. The present invention has achieved good results on multiple grid parts segmentation data sets.

Description of drawings

FIG. 1 is a schematic diagram of the process of mesh division according to the present invention.

FIG. 2 is an effect diagram of the grid segmentation of the present invention, wherein the segmentation of adjacent different categories is distinguished by black and white.

Detailed ways

The idea of the present invention is: using the adjacency relationship of the faces in the grid to form a graph, using the graph convolutional neural network and feature embedding to learn features on this graph, and finally using the fully connected layer to obtain a score belonging to each category for each facet, Finally, predict the category that each face belongs to, which includes the following steps:

Step 1: Transform the mesh model to the specified number of patches, and perform centering and scaling operations.

Step 2: Convert the model processed in step 1 into a graph representation, and perform preliminary feature extraction for each face and input it into the trained corresponding graph convolutional neural network. For the type of part that each face in the grid belongs to Make predictions. Wherein, the graph convolutional neural network consists of a transformation module, a graph convolution module, a feature embedding module and a fully connected layer.

Step 1 is a preprocessing step, and the graph convolutional neural network structure in step 2 is shown in Figure 1.

For an input mesh model M={V,F}, where V represents all vertices and F represents all faces. After feature extraction, an undirected graph G={Q, E, Φ} is established. For each f _i ∈ F, a node qi _∈ Q is created, and for each pair of adjacent faces f _i f _j , a node is created. An undirected edge (q _i ,q _j )∈E. Φ is the feature of each node. For f _i , its corresponding Φ _i ={ _ci , _ni ,vi ,a _i } _{represents the centroid coordinate, normal direction, vertex coordinate and area corresponding to the face f i} _, respectively.

The graph convolutional network used in the present invention uses a plurality of convolution layers, and what is adopted is (Wang, Yue, et al. "Dynamic graph cnn for learning on point clouds." Acm Transactions On Graphics (tog) 38.5 (2019): 1-12.) in the basic structure. Among them, for the graph G ^l ={Q ^l ,E ^l ,Φ ^l } of the lth layer, the node features are updated as follows:

in

is a nonlinear function with a learnable parameter θ. This update method considers both global features φ _i and local features that reflect the relationship between adjacent faces

In static edge convolution, edge ^El is the initial face adjacency relationship. In dynamic edge convolution, consider using Euclidean distance as a metric in the feature space, and consider the k faces with the closest distance as adjacent faces.

The transformation module combines a static convolution, a maximum pooling and several fully connected layers, predicts a rotation matrix for each input feature map, and then transforms the initial input features through the rotation matrix, so that the subsequent processing features Target as close as possible.

The graph convolution block is composed of three layers of static edge convolution layers and three layers of dynamic convolution layers. The dynamic convolution layer selects the 10 closest surfaces in the feature space as adjacent surfaces, and finally the results of each layer are connected and input to the pooling layer to get an overall feature representation.

The feature embedding module is mainly based on the features learned by graph convolution, and uses the fully connected layer to predict its representation _si in the feature space for each face, and the value σ _i related to the size of its corresponding category in the feature space. During training, the loss function for the feature embedding module is given by:

L=α* _Lvar +β*L _dist +γ* _Lreg

This loss function is proposed by (De Brabandere, Bert, Davy Neven, and Luc Van Gool. "Semantic Instance Segmentation with a Discriminative Loss Function." arXiv(2017): arXiv-1708.). where C is the number of classes, N _c is the number of faces in class c, R _c is the set of faces in class _c , and uc is the mean of _si in class c. δv and _δd are the thresholds, which are set to 0.01 and 3, respectively. _α , β, and γ are the weights of the above parts, which are set to 1, 1, and 0.001 in actual training. c _A and c _B represent different categories. In the above loss function, L _var keeps the current embedding close to the mean of the class, L _dist keeps the embeddings of different classes away, and L _reg constrains the range of the embedding.

During training, face i outputs the probability of class c

in

is the mean of the category range. Finally, a cross-entropy loss term is calculated based on this probability and the true class.

After obtaining the result of feature embedding, the final output layer takes the features learned by the previous graph convolution layer and the result of feature embedding as input, and passes through several fully connected layers to obtain the final prediction result, and calculates its cross entropy loss. Except for the last layer, leakyRELU is used as the activation function and batch normalization. The main function of the fully connected layer is to remap the previously obtained features to the category space by weighting, that is, the output of the fully connected layer is a tensor of the number of patches × the number of categories, and the final output obtained after softmax is equivalent to The predicted probability of each class.

Among them, the data set used in training can be obtained in the following way: For the marked model, the number of faces is reduced to a similar number (the number of faces of all models is the closest to the specified number of faces), and a data set that can be used for training can be obtained.

Part of the segmentation results are shown in Figure 2. It can be seen from the figure that the present invention has a good segmentation effect on various types of models.

Obviously, the above-mentioned embodiments are only examples for clear description, and are not intended to limit the implementation manner. For those of ordinary skill in the art, changes or modifications in other different forms can also be made on the basis of the above description. All implementations need not and cannot be exhaustive here. However, the obvious changes or changes derived from this are still within the protection scope of the present invention.

Claims

A grid segmentation method based on a graph convolutional network, characterized in that it comprises the following steps:

Step 1: Transform the mesh model to the specified number of patches and standardize it;

Step 2: Convert the model processed in Step 1 into a graph representation, and perform preliminary feature extraction on each face and input it into the trained graph convolutional neural network to predict the type of part that each face in the grid belongs to. ; wherein, the graph convolutional neural network includes:

The transformation module is used to make the orientation of the input preliminary features similar;

The graph convolution module is used to learn the features related to the adjacent surfaces in the actual space and the adjacent surfaces in the feature space according to the transformed preliminary features;

The feature embedding module is used to obtain features with similar similar and different categories with far distance according to the features obtained by the graph convolution module;

The output module is used to obtain predicted segmentation results based on the features learned by the graph convolutional layer and the results of feature embedding.
A grid segmentation method based on a graph convolutional network according to claim 1, wherein the step 1 is realized by the following sub-steps:

(1.1) For the input model, simplify or subdivide it to the specified number of patches;

(1.2) For the transformed model, perform translation and scaling operations on it, so that the mean value of all vertices in the model is 0, and the maximum distance from the origin is 1.
The grid segmentation method based on a graph convolutional network according to claim 1, wherein the transformation module is composed of a static convolutional layer, a maximum pooling layer and a number of fully connected layers. A static convolutional layer, a maximum pooling layer and several fully connected layers are used to predict a rotation matrix and transform the input preliminary features through the rotation matrix.
The method for grid segmentation based on a graph convolutional network according to claim 1, wherein the graph convolution module comprises a static convolution layer, a dynamic convolution layer, a fully connected layer and a pooling layer, wherein , the features learned by the static convolution layer and the dynamic convolution layer are connected and input to the fully connected layer for summarization and the overall feature is obtained through the pooling layer.
The grid segmentation method based on a graph convolutional network according to claim 1, wherein the feature embedding module is composed of a fully connected layer.
The method for grid segmentation based on graph convolutional network according to claim 1, wherein the feature embedding module uses three loss functions to constrain it during training, L var constrains similar features of the same type, and L dist constrains The features of different categories are far away, and Lreg constrains the range of feature embeddings.