CN117934488A - Construction and optimization method of three-dimensional shape segmentation framework based on semi-supervision and electronic equipment - Google Patents
Construction and optimization method of three-dimensional shape segmentation framework based on semi-supervision and electronic equipment Download PDFInfo
- Publication number
- CN117934488A CN117934488A CN202311667459.2A CN202311667459A CN117934488A CN 117934488 A CN117934488 A CN 117934488A CN 202311667459 A CN202311667459 A CN 202311667459A CN 117934488 A CN117934488 A CN 117934488A
- Authority
- CN
- China
- Prior art keywords
- dimensional shape
- data set
- module
- segmentation network
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 182
- 238000000034 method Methods 0.000 title claims abstract description 76
- 238000010276 construction Methods 0.000 title claims description 9
- 238000005457 optimization Methods 0.000 title claims description 6
- 238000012549 training Methods 0.000 claims abstract description 29
- 238000004364 calculation method Methods 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 31
- 230000008569 process Effects 0.000 claims description 15
- 239000011159 matrix material Substances 0.000 claims description 14
- 239000013598 vector Substances 0.000 claims description 14
- 238000009877 rendering Methods 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims description 5
- 238000005192 partition Methods 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 5
- 238000009826 distribution Methods 0.000 claims description 3
- 238000004590 computer program Methods 0.000 description 14
- 238000003860 storage Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000000638 solvent extraction Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000009472 formulation Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003909 pattern recognition Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a method for constructing and optimizing a three-dimensional shape segmentation framework based on semi-supervision and electronic equipment; training the auxiliary segmentation network by inputting the third data set into the auxiliary segmentation network module; inputting the second data set into the trained auxiliary segmentation network module to output a three-dimensional shape containing complete labels, namely a pseudo tag, and obtaining a fifth data set; training a primary segmentation network through the first data set and the fifth data set; fusing the pseudo tag with the patch characteristics output by the main segmentation network through the self-refinement module; calculating a cross entropy loss value by using the real label, the fused pseudo label and a prediction label generated by the main segmentation network through a calculation module, and adjusting network parameters of the main segmentation network according to the cross entropy loss value; the method and the device generate the corresponding complete label for the sparse label in the second data set through the trained auxiliary segmentation network, amplify the training data for training the main segmentation network, and avoid the cost of expensive full-label training data.
Description
Technical Field
The application relates to the field of three-dimensional shape segmentation, in particular to a method for constructing and optimizing a three-dimensional shape segmentation framework based on semi-supervision and electronic equipment.
Background
The task of three-dimensional shape segmentation involves dividing a three-dimensional shape into meaningful parts, which is critical for efficient processing of the three-dimensional shape. It enables us to more easily understand the intrinsic properties of the shape, such as its topology. Thus, various tasks, including mesh editing, reconstruction, modeling, deformation, and shape retrieval, rely on three-dimensional shape segmentation to achieve satisfactory results. At present, shape segmentation has become one of the most popular and challenging areas of research.
Conventional three-dimensional shape segmentation methods typically include three main steps. First, each face on the shape is mapped to a corresponding feature vector using a hand-made shape descriptor, then a clustering or classification method is applied in the feature space to assign a label to each feature vector, and finally, each face in the three-dimensional shape is labeled according to the label of the corresponding feature vector. However, recent advances in machine learning have led to the development of learning-based segmentation methods, particularly deep learning architecture-based methods. These methods represent a significant improvement in performance over traditional geometric optimization methods
Although learning-based segmentation methods, and in particular deep learning methods, have achieved impressive results, they suffer from a major disadvantage in that a large amount of fully labeled training data, similar to the target shape, is required, which can present a significant time and cost burden in terms of manual labeling. Furthermore, these methods typically require the preparation of a different set of training shapes for each target's three-dimensional shape, adding to the complexity of the training process.
Disclosure of Invention
In order to overcome the limitation of the current segmentation method based on learning, reduce the complexity of the training process and improve the segmentation accuracy, according to one aspect of the embodiment of the application, the application provides a method for constructing a three-dimensional shape segmentation framework based on semi-supervision, wherein the three-dimensional shape segmentation framework comprises an auxiliary segmentation network module, a self-refinement module and a main segmentation network module comprising a main segmentation network and a calculation module; the auxiliary segmentation network module comprises an auxiliary segmentation network for predicting pseudo labels, a projection module for projecting a three-dimensional shape and a back projection module;
the construction method comprises the following steps:
Acquiring a first data set and a second data set; the data in the first data set is a completely marked three-dimensional shape; the data in the second data set are three-dimensional shapes with sparse Scribble labels;
sampling the reference labels of the three-dimensional shapes in the first data set to generate corresponding three-dimensional shapes with sparse scale labels, and obtaining a third data set;
inputting the third data set into an auxiliary segmentation network module to train an auxiliary segmentation network, wherein the projection module acquires multi-view projection images corresponding to three-dimensional shapes in the third data set to obtain a fourth data set, and establishes a reference matrix; the data in the reference matrix comprises vertex coordinates of each pixel on the original three-dimensional shape in the process of acquiring the multi-view projection image; the auxiliary segmentation network trains based on the first objective function and the multi-view projection images in the fourth dataset to predict prediction labels corresponding to two-dimensional images in the multi-view projection images; the back projection module maps the prediction tag back to the corresponding three-dimensional shape patch by utilizing the reference matrix so as to generate a corresponding complete label for each three-dimensional shape patch in the third data set;
Inputting the second data set into the trained auxiliary segmentation network module to output a three-dimensional shape containing complete labels, namely a pseudo tag, and obtaining a fifth data set;
Training a main segmentation network through the first data set and the fifth data set to output corresponding patch characteristics for each patch of the three-dimensional shape and predict corresponding prediction labels;
the pseudo tag output by the auxiliary segmentation network module is fused with the patch characteristic output by the main segmentation network through the self-thinning module, so that the fused pseudo tag is obtained;
And calculating a cross entropy loss value by using the real label, the fused pseudo label and a prediction label generated by the main segmentation network through a calculation module, and adjusting network parameters of the main segmentation network according to the cross entropy loss value.
Further, the two-dimensional image of the multi-view projection image comprises a depth map and a rendering map;
The back projection module is specifically configured to find a patch with a corresponding coordinate in the three-dimensional shape according to the vertex coordinate recorded in the reference matrix (corresponding position) of each pixel, obtain a corresponding relationship between a prediction label in the two-dimensional image and a semantic label of each patch on the three-dimensional shape, and map the prediction label onto the corresponding patch with the three-dimensional shape based on the corresponding relationship.
Further, the auxiliary segmentation network comprises:
encoder module for receiving the multi-view projection image and extracting the image characteristic information corresponding to each two-dimensional image in the multi-view projection image;
And the Decoder module is used for carrying out label prediction on the corresponding two-dimensional images according to the image characteristic information to obtain prediction labels corresponding to the two-dimensional images.
Further, the primary partitioning network comprises:
the main subdivision module consists of four full-connection layers and is used for extracting the surface patch characteristics corresponding to each surface patch in the three-dimensional shape;
the Softmax module is used for predicting a corresponding prediction tag according to the corresponding patch characteristics of each patch;
The computing module is specifically configured to:
For the first data set, a cross entropy loss value is calculated through a real label and a prediction label generated by the Softmax module, and for the fifth data set, a cross entropy loss value is calculated through a fused pseudo label and a prediction label generated by the Softmax module.
Further, the self-refinement module comprises two convolution layers, and is used for dynamically learning the weight distribution of the main segmentation network and the auxiliary segmentation network to the prediction result of the three-dimensional shape with the sparse scale label in the prediction process.
Further, the formula expression of the first objective function is:
Wherein F represents a first dataset and s (f) represents a patch of three-dimensional shape in the first dataset; x (f) represents the feature vector of the s (f) patch, y (f) represents the prediction label of the auxiliary segmentation network to x (f), θ represents the network parameter of the auxiliary segmentation network, and p aux represents the prediction function corresponding to the auxiliary segmentation network.
According to another aspect of the embodiment of the present application, there is also provided an optimization method of a three-dimensional shape segmentation framework based on semi-supervision, including:
Acquiring an objective function;
optimizing the three-dimensional shape segmentation frame through an objective function; wherein the three-dimensional shape segmentation framework is obtained by the method for constructing the three-dimensional shape segmentation framework based on semi-supervision.
Further, the formula expression of the objective function is:
In the method, in the process of the invention, And phi represents network parameters of the main segmentation network, lambda represents network parameters of the self-thinning module, F represents a first data set, p pri represents prediction of the main segmentation network, x (f) represents characteristic vectors of three-dimensional shape patches in the first data set F, Y (f) represents prediction labels of the main segmentation network on x (f), S represents a second data set or a third data set, q conv represents dynamic weights generated by the self-thinning module, x (s) represents characteristic vectors of three-dimensional shape patches in the second data set or the third data set, S (s) represents prediction of x (s) by the auxiliary segmentation network, and S (f) represents three-dimensional shape patches in the first data set.
According to another aspect of the embodiment of the present application, there is also provided a three-dimensional shape segmentation method based on semi-supervision, including:
Obtaining an unlabeled three-dimensional shape to be segmented;
Inputting the unlabeled three-dimensional shape to be segmented into a main segmentation network for prediction to obtain a three-dimensional shape containing complete labeling; the main segmentation network is trained by the construction method of the three-dimensional shape segmentation framework based on semi-supervision.
According to still another aspect of an embodiment of the present application, there is also provided an electronic device including: a processor, and a memory storing a program, the program comprising instructions that when executed by the processor cause the processor to perform the method of any of the above embodiments.
Compared with the prior art, the technical scheme provided by the application has the following technical effects:
(1) The method comprises the steps of sampling reference labels of three-dimensional shapes in a first data set to generate corresponding three-dimensional shapes with sparse Scribble labels, and obtaining a third data set; inputting the third data set into an auxiliary segmentation network module to train the auxiliary segmentation network; inputting the second data set into the trained auxiliary segmentation network module to output a three-dimensional shape containing complete labels, namely a pseudo tag, and obtaining a fifth data set; training a main segmentation network through the first data set and the fifth data set to output corresponding patch characteristics for each patch of the three-dimensional shape and predict corresponding prediction labels; the pseudo tag output by the auxiliary segmentation network module is fused with the patch characteristic output by the main segmentation network through the self-thinning module, so that the fused pseudo tag is obtained; calculating a cross entropy loss value by using the real label, the fused pseudo label and a prediction label generated by the main segmentation network through a calculation module, and adjusting network parameters of the main segmentation network according to the cross entropy loss value; the method and the device generate the corresponding complete label for the sparse label in the second data set through the trained auxiliary segmentation network, so that training data for training the main segmentation network is amplified, and the cost of expensive full-label training data is effectively avoided;
(2) According to the invention, the pseudo tag output by the auxiliary segmentation network module and the patch characteristic output by the main segmentation network are fused through the automatic refinement module, so that the fused pseudo tag is obtained, the calculation module calculates the cross entropy loss value by utilizing the real tag, the fused pseudo tag and the prediction tag generated by the main segmentation network, and the network parameter of the main segmentation network is adjusted according to the cross entropy loss value, so that the prediction precision is greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is evident that the drawings in the following description are only some embodiments of the invention, from which other embodiments can be obtained for a person skilled in the art without inventive effort.
FIG. 1 is a flow chart of a method of constructing a three-dimensional shape segmentation framework based on semi-supervision;
FIG. 2 is a flow chart of a method of optimizing a three-dimensional shape segmentation framework based on semi-supervision;
FIG. 3 is a flow chart of a semi-supervised three dimensional shape segmentation method;
FIG. 4 is a three-dimensional shape diagram with sparse Scribble;
FIG. 5 is a three-dimensional shape diagram with complete labeling;
FIG. 6 is a block diagram of an assisted split network module;
FIG. 7 is a block diagram of a primary split network module;
FIG. 8 is a three-dimensional shape segmentation framework diagram based on semi-supervision;
fig. 9 is a schematic structural view of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present embodiment will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present embodiments are illustrated in the accompanying drawings, it is to be understood that the present embodiments may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided to provide a more thorough and complete understanding of the present embodiments. It should be understood that the drawings and the embodiments of the present embodiments are presented for purposes of illustration only and are not intended to limit the scope of the embodiments.
In order to better understand the embodiments of the present application, technical terms related to the embodiments of the present application are explained as follows:
semi-supervised learning is a key problem of pattern recognition and machine learning field study, and is a learning method combining supervised learning and unsupervised learning. Semi-supervised learning uses a large amount of unlabeled data, and at the same time, labeled data, to perform pattern recognition work. When semi-supervised learning is used, fewer people are required to do work, and meanwhile, higher accuracy can be brought.
In order to overcome the limitation of the current segmentation method based on learning, reduce the complexity of the training process and improve the segmentation accuracy, as shown in fig. 1, the invention provides a method for constructing a three-dimensional shape segmentation framework based on semi-supervision, wherein the three-dimensional shape segmentation framework comprises an auxiliary segmentation network module shown in fig. 6, a self-refinement module and a main segmentation network module comprising a main segmentation network and a calculation module shown in fig. 7, wherein the main segmentation network module comprises a main segmentation network and a calculation module; the auxiliary segmentation network module comprises an auxiliary segmentation network for predicting pseudo labels, a projection module for projecting a three-dimensional shape and a back projection module;
the construction method comprises the following steps:
Acquiring a first data set and a second data set; the data in the first data set is a fully annotated three-dimensional shape as shown in fig. 5; the data in the second data set is a three-dimensional shape with sparse Scribble as shown in fig. 4;
It should be noted that the present embodiment obtains the first data set and the second data set through some of the disclosed reference data sets, including the PSB data set, COSEG data set, and ShapeNetCore data set. These disclosed datasets provide patch-level reference labels for each three-dimensional shape. It should be further noted that the tag of each three-dimensional shape in the first data set is a real tag.
This embodiment employs different data set partitioning strategies ("1+2+2", "2+1+2") for subsequent training. Specifically, "1+2+2" means that 20% of the fully labeled shapes (to be the first dataset) and 40% of the shapes with sparse Scribble (to be the second dataset) in the reference dataset were selected as the training dataset, while the remaining 40% of the shapes were retained as the test dataset. Meanwhile, "2+1+2" means that 40% of the completely labeled shapes (first data set) and 20% of the shapes with sparse Scribble labels (second data set) in the reference data set are selected as training data sets, and the remaining 40% of the shapes are retained as test data sets as well.
Sampling the reference labels of the three-dimensional shapes in the first data set according to a preset proportion to generate corresponding three-dimensional shapes with sparse scale labels, and obtaining a third data set;
inputting the third data set into an auxiliary segmentation network module to train an auxiliary segmentation network, wherein the projection module acquires multi-view projection images corresponding to three-dimensional shapes in the third data set to obtain a fourth data set, and establishes a reference matrix; the data in the reference matrix comprises vertex coordinates of each pixel on the original three-dimensional shape in the process of acquiring the multi-view projection image; the auxiliary segmentation network trains based on the first objective function and the multi-view projection images in the fourth dataset to predict prediction labels corresponding to two-dimensional images in the multi-view projection images; the back projection module maps the prediction tag back to the corresponding three-dimensional shape patch by utilizing the reference matrix so as to generate a corresponding complete label for each three-dimensional shape patch in the third data set;
Specifically, the present embodiment performs a projection operation on each three-dimensional shape in the third data set: at a preset position of a spherical bounding box of a model (three-dimensional shape), 32 virtual cameras are placed, each rotated four times at 90 degree intervals, thereby generating a total of 128 sets of depth maps and rendering maps, which cover almost all vertices and patches of each three-dimensional shape in the dataset.
Specifically, the method for establishing the reference matrix comprises the following steps: in the process of rendering the three-dimensional shape according to the pose of the camera, the vertex coordinates of each pixel corresponding to the original three-dimensional shape are recorded by using an additional rendering buffer and stored into a reference matrix.
The two-dimensional image of the multi-view projection image comprises a depth map and a rendering map;
The auxiliary segmentation network comprises:
encoder module for receiving the multi-view projection image and extracting the image characteristic information corresponding to each two-dimensional image in the multi-view projection image;
And the Decoder module is used for carrying out label prediction on the corresponding two-dimensional images according to the image characteristic information to obtain prediction labels corresponding to the two-dimensional images.
It should be noted that the Encoder module and the Decoder module are main modules in the DeepLabv3+ image semantic segmentation network, and the present embodiment generates a semantic label for each pixel of the input image by combining the Encoder module and the Decoder module.
The formula expression of the first objective function is:
Wherein F represents a first dataset and s (f) represents a patch of three-dimensional shape in the first dataset; x (f) represents the feature vector of the s (f) patch, y (f) represents the prediction label of the auxiliary segmentation network to x (f), θ represents the network parameter of the auxiliary segmentation network, and p aux represents the prediction function corresponding to the auxiliary segmentation network.
The back projection module is specifically configured to search for a patch with a corresponding coordinate in the three-dimensional shape according to the vertex coordinate recorded in the reference matrix of each pixel, obtain a corresponding relationship between a prediction label in the two-dimensional image and a semantic label of each patch on the three-dimensional shape, and map the prediction label onto the corresponding patch with the three-dimensional shape based on the corresponding relationship.
Inputting the second data set into the trained auxiliary segmentation network module to output a three-dimensional shape containing complete labels, namely a pseudo tag, and obtaining a fifth data set;
Training a main segmentation network through the first data set and the fifth data set to output corresponding patch characteristics for each patch of the three-dimensional shape and predict corresponding prediction labels;
The primary partitioning network includes:
the main subdivision module consists of four full-connection layers and is used for extracting the surface patch characteristics corresponding to each surface patch in the three-dimensional shape;
the Softmax module is used for predicting a corresponding prediction tag according to the corresponding patch characteristics of each patch;
The computing module is specifically configured to:
For the first data set, a cross entropy loss value is calculated through a real label and a prediction label generated by the Softmax module, and for the fifth data set, a cross entropy loss value is calculated through a fused pseudo label and a prediction label generated by the Softmax module.
The pseudo tag output by the auxiliary segmentation network module is fused with the patch characteristic output by the main segmentation network through the self-thinning module, so that the fused pseudo tag is obtained;
Since the aided segmentation network module obtained using training only is applied to the second data set, the accuracy of the obtained pseudo tag is very limited. Therefore, in order to improve the precision of the generated complete annotation data set, the invention introduces an automatic refinement module based on CNN convolution. The module fuses the prediction results of the main segmentation network and the auxiliary segmentation network module together to dynamically generate more accurate complete label data at the patch level, thereby improving the prediction precision of the main segmentation network.
The self-refinement module comprises two layers of convolution layers and is used for dynamically learning the weight distribution of a main segmentation network and an auxiliary segmentation network to the prediction result of the three-dimensional shape with the sparse scale label in the prediction process. For example, in the early stages of training, the predictions of the auxiliary segmentation network may be more accurate, and thus the self-refinement module may assign higher weights to the prediction labels generated by the auxiliary segmentation network. However, in the later stages of training, the prediction of the primary partition network may be more accurate, and thus the self-refinement module may assign a higher weight to the prediction tags generated by the primary partition network. By this mechanism of dynamic adjustment, the self-refinement module is able to generate more accurate label predictions for the second dataset.
And calculating a cross entropy loss value by using the real label, the fused pseudo label and a prediction label generated by the main segmentation network through a calculation module, and adjusting network parameters of the main segmentation network according to the cross entropy loss value.
The method comprises the steps of sampling reference labels of three-dimensional shapes in a first data set to generate corresponding three-dimensional shapes with sparse Scribble labels, and obtaining a third data set; inputting the third data set into an auxiliary segmentation network module to train the auxiliary segmentation network; inputting the second data set into the trained auxiliary segmentation network module to output a three-dimensional shape containing complete labels, namely a pseudo tag, and obtaining a fifth data set; training a main segmentation network through the first data set and the fifth data set to output corresponding patch characteristics for each patch of the three-dimensional shape and predict corresponding prediction labels; the pseudo tag output by the auxiliary segmentation network module is fused with the patch characteristic output by the main segmentation network through the self-thinning module, so that the fused pseudo tag is obtained; calculating a cross entropy loss value by using the real label, the fused pseudo label and a prediction label generated by the main segmentation network through a calculation module, and adjusting network parameters of the main segmentation network according to the cross entropy loss value; the method and the system generate the corresponding complete label for the sparse label in the second data set through the trained auxiliary segmentation network, so that training data for training the main segmentation network is amplified, and the cost of expensive full-label training data is effectively avoided.
As shown in fig. 2, the embodiment of the invention further provides an optimization method of the three-dimensional shape segmentation framework based on semi-supervision, which comprises the following steps:
Acquiring an objective function;
Specifically, the objective function of the three-dimensional shape segmentation framework is based on:
training an objective function of the auxiliary segmentation network, an objective function of the main segmentation network that needs to be optimized if only the first dataset is used, and an objective function construction of the three-dimensional shape segmentation framework if only the auxiliary segmentation network module and the main segmentation network module are used; wherein:
The formula expression of the objective function of the training auxiliary segmentation network is:
wherein F represents a first dataset and s (f) represents a patch of three-dimensional shape in the first dataset; x (f) represents a feature vector of the s (f) patch, y (f) in the p aux function represents a prediction label of the auxiliary segmentation network to x (f), θ represents a network parameter of the auxiliary segmentation network, and p aux represents a prediction function corresponding to the auxiliary segmentation network;
the formulation of the objective function that the primary segmentation network needs to optimize using only the first dataset is:
In the method, in the process of the invention, Network parameters representing the primary segmented network, p pri representing a prediction of the primary segmented network, x (f) representing feature vectors of patches of three-dimensional shape in the first dataset F, y (f) in the p pri function representing a prediction label of the primary segmented network to x (f);
The formulation of the objective function of the three-dimensional shape segmentation framework in the case of using only the auxiliary segmentation network module and the main segmentation network module is:
thus, the formula of the objective function of the three-dimensional shape segmentation frame is:
In the method, in the process of the invention, And phi represents network parameters of the main segmentation network, lambda represents network parameters of the self-refinement module, F represents a first data set, p pri represents prediction of the main segmentation network, x (f) represents a characteristic vector of a three-dimensional shape patch in the first data set F, y (f) represents a prediction label of the main segmentation network on x (f), S represents a second data set or a third data set, q conv represents dynamic weight generated by the self-refinement module, x (s) represents a characteristic vector of a three-dimensional shape patch in the second data set or the third data set, S (s) represents a prediction of x (s) by the auxiliary segmentation network, and S (f) represents a three-dimensional shape patch in the first data set.
Optimizing the three-dimensional shape segmentation frame through an objective function; wherein the three-dimensional shape segmentation framework is obtained by the method for constructing the three-dimensional shape segmentation framework based on semi-supervision.
As shown in fig. 3, the embodiment of the invention further provides a three-dimensional shape segmentation method based on semi-supervision, which comprises the following steps:
Obtaining an unlabeled three-dimensional shape to be segmented;
Inputting the unlabeled three-dimensional shape to be segmented into a main segmentation network for prediction to obtain a three-dimensional shape containing complete labeling; the main segmentation network is trained by the construction method of the three-dimensional shape segmentation framework based on semi-supervision.
The embodiment of the invention also provides electronic equipment, which comprises: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor, which when executed by the at least one processor is adapted to cause an electronic device to perform a method of an embodiment of the invention.
The embodiments of the present invention also provide a non-transitory machine-readable medium storing a computer program, wherein the computer program is configured to cause a computer to perform the method of the embodiments of the present invention when executed by a processor of the computer.
The embodiments of the present invention also provide a computer program product comprising a computer program, wherein the computer program, when being executed by a processor of a computer, is for causing the computer to perform the method of the embodiments of the present invention.
With reference to fig. 4, a block diagram of an electronic device that may be a server or a client of an embodiment of the present invention will now be described, which is an example of a hardware device that may be applied to aspects of the present invention. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device includes a computing unit 401 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 402 or a computer program loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data required for the operation of the electronic device can also be stored. The computing unit 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
A number of components in the electronic device are connected to the I/O interface 405, including: an input unit 406, an output unit 407, a storage unit 408, and a communication unit 409. The input unit 406 may be any type of device capable of inputting information to an electronic device, and the input unit 406 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 407 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 408 may include, but is not limited to, magnetic disks, optical disks. The communication unit 409 allows the electronic device to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.
The computing unit 401 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 401 include, but are not limited to, a CPU, a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 401 performs the respective methods and processes described above. For example, in some embodiments, method embodiments of the present invention may be implemented as a computer program tangibly embodied on a machine-readable medium, such as storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device via the ROM 402 and/or the communication unit 409. In some embodiments, the computing unit 401 may be configured to perform the above-described methods by any other suitable means (e.g., by means of firmware).
A computer program for implementing the methods of embodiments of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of embodiments of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable signal medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
It should be noted that the term "comprising" and its variants as used in the embodiments of the present invention are open-ended, i.e. "including but not limited to". The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. References to "one or more" modifications in the examples of the invention are intended to be illustrative rather than limiting, and it will be understood by those skilled in the art that "one or more" is intended to be interpreted as "one or more" unless the context clearly indicates otherwise.
User information (including but not limited to user equipment information, user personal information and the like) and data (including but not limited to data for analysis, stored data, presented data and the like) according to the embodiment of the invention are information and data authorized by a user or fully authorized by all parties, and the collection, use and processing of related data are required to comply with related laws and regulations and standards of related countries and regions, and are provided with corresponding operation entrances for users to select authorization or rejection.
The steps described in the method embodiments provided in the embodiments of the present invention may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the invention is not limited in this respect.
The term "embodiment" in this specification means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive. The various embodiments in this specification are described in a related manner, with identical and similar parts being referred to each other. In particular, for apparatus, devices, system embodiments, the description is relatively simple as it is substantially similar to method embodiments, see for relevant part of the description of method embodiments.
The above examples merely represent a few embodiments of the present invention, which are described in more detail and are not to be construed as limiting the scope of the patent claims. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of the invention should be assessed as that of the appended claims.
Claims (10)
1. The method for constructing the three-dimensional shape segmentation framework based on semi-supervision is characterized in that the three-dimensional shape segmentation framework comprises an auxiliary segmentation network module, a self-refinement module and a main segmentation network module comprising a main segmentation network and a calculation module; the auxiliary segmentation network module comprises an auxiliary segmentation network for predicting pseudo labels, a projection module for projecting a three-dimensional shape and a back projection module;
the construction method comprises the following steps:
Acquiring a first data set and a second data set; the data in the first data set is a completely marked three-dimensional shape; the data in the second data set are three-dimensional shapes with sparse Scribble labels;
sampling the reference labels of the three-dimensional shapes in the first data set to generate corresponding three-dimensional shapes with sparse scale labels, and obtaining a third data set;
inputting the third data set into an auxiliary segmentation network module to train an auxiliary segmentation network, wherein the projection module acquires multi-view projection images corresponding to three-dimensional shapes in the third data set to obtain a fourth data set, and establishes a reference matrix; the data in the reference matrix comprises vertex coordinates of each pixel on the original three-dimensional shape in the process of acquiring the multi-view projection image; the auxiliary segmentation network trains based on the first objective function and the multi-view projection images in the fourth dataset to predict prediction labels corresponding to two-dimensional images in the multi-view projection images; the back projection module maps the prediction tag back to the corresponding three-dimensional shape patch by utilizing the reference matrix so as to generate a corresponding complete label for each three-dimensional shape patch in the third data set;
Inputting the second data set into the trained auxiliary segmentation network module to output a three-dimensional shape containing complete labels, namely a pseudo tag, and obtaining a fifth data set;
Training a main segmentation network through the first data set and the fifth data set to output corresponding patch characteristics for each patch of the three-dimensional shape and predict corresponding prediction labels;
the pseudo tag output by the auxiliary segmentation network module is fused with the patch characteristic output by the main segmentation network through the self-thinning module, so that the fused pseudo tag is obtained;
And calculating a cross entropy loss value by using the real label, the fused pseudo label and a prediction label generated by the main segmentation network through a calculation module, and adjusting network parameters of the main segmentation network according to the cross entropy loss value.
2. The method for constructing a semi-supervised three dimensional shape segmentation framework according to claim 1, wherein the two-dimensional image of the multi-view projection image comprises a depth map and a rendering map;
The back projection module is specifically configured to find a patch with a corresponding coordinate in the three-dimensional shape according to the vertex coordinate recorded in the reference matrix (corresponding position) of each pixel, obtain a corresponding relationship between a prediction label in the two-dimensional image and a semantic label of each patch on the three-dimensional shape, and map the prediction label onto the corresponding patch with the three-dimensional shape based on the corresponding relationship.
3. A method of constructing a semi-supervised three dimensional shape segmentation framework as defined in claim 2, wherein the auxiliary segmentation network comprises:
encoder module for receiving the multi-view projection image and extracting the image characteristic information corresponding to each two-dimensional image in the multi-view projection image;
And the Decoder module is used for carrying out label prediction on the corresponding two-dimensional images according to the image characteristic information to obtain prediction labels corresponding to the two-dimensional images.
4. A method of constructing a semi-supervised based three dimensional shape segmentation framework as defined in claim 3, wherein the primary segmentation network comprises:
the main subdivision module consists of four full-connection layers and is used for extracting the surface patch characteristics corresponding to each surface patch in the three-dimensional shape;
the Softmax module is used for predicting a corresponding prediction tag according to the corresponding patch characteristics of each patch;
The computing module is specifically configured to:
For the first data set, a cross entropy loss value is calculated through a real label and a prediction label generated by the Softmax module, and for the fifth data set, a cross entropy loss value is calculated through a fused pseudo label and a prediction label generated by the Softmax module.
5. The method for constructing a three-dimensional shape segmentation framework based on semi-supervision according to claim 4, wherein the self-refinement module comprises two layers of convolution layers for dynamically learning the weight distribution of the main segmentation network and the auxiliary segmentation network to the prediction result of the three-dimensional shape with sparse scale labels in the prediction process.
6. The method for constructing a three-dimensional shape segmentation framework based on semi-supervision according to claim 5, wherein the formula expression of the first objective function is:
Wherein F represents a first dataset and S (f) represents a patch of three-dimensional shape in the first dataset; x (f) represents the feature vector of the S (f) patch, y (f) represents the prediction label of the auxiliary partition network to x (f), θ represents the network parameter of the auxiliary partition network, and p aux represents the prediction function corresponding to the auxiliary partition network.
7. An optimization method of a three-dimensional shape segmentation framework based on semi-supervision is characterized by comprising the following steps:
Acquiring an objective function;
Optimizing the three-dimensional shape segmentation frame through an objective function; wherein the three-dimensional shape segmentation framework is obtained by the method of construction of a semi-supervised based three-dimensional shape segmentation framework as defined in any one of claims 1 to 6.
8. The method of optimizing a semi-supervised based three dimensional shape segmentation framework of claim 7, wherein the objective function has a formula:
In the method, in the process of the invention, And phi represents network parameters of the main segmentation network, lambda represents network parameters of the self-refinement module, F represents a first data set, p pri represents prediction of the main segmentation network, x (f) represents a characteristic vector of a three-dimensional shape patch in the first data set F, y (f) represents a prediction label of the main segmentation network on x (f), S represents a second data set or a third data set, q conv represents dynamic weight generated by the self-refinement module, x (s) represents a characteristic vector of a three-dimensional shape patch in the second data set or the third data set, S (s) represents a prediction of x (s) by the auxiliary segmentation network, and S (f) represents a three-dimensional shape patch in the first data set.
9. A semi-supervised three dimensional shape segmentation method, comprising:
Obtaining an unlabeled three-dimensional shape to be segmented;
Inputting the unlabeled three-dimensional shape to be segmented into a main segmentation network for prediction to obtain a three-dimensional shape containing complete labeling; wherein the main segmentation network is trained by the method for constructing a semi-supervised three dimensional shape segmentation framework of any of claims 1-6.
10. An electronic device, comprising: a processor, and a memory storing a program, characterized in that the program comprises instructions which, when executed by the processor, cause the processor to perform the method of any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311667459.2A CN117934488A (en) | 2023-12-06 | 2023-12-06 | Construction and optimization method of three-dimensional shape segmentation framework based on semi-supervision and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311667459.2A CN117934488A (en) | 2023-12-06 | 2023-12-06 | Construction and optimization method of three-dimensional shape segmentation framework based on semi-supervision and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117934488A true CN117934488A (en) | 2024-04-26 |
Family
ID=90758125
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311667459.2A Pending CN117934488A (en) | 2023-12-06 | 2023-12-06 | Construction and optimization method of three-dimensional shape segmentation framework based on semi-supervision and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117934488A (en) |
-
2023
- 2023-12-06 CN CN202311667459.2A patent/CN117934488A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7373554B2 (en) | Cross-domain image transformation | |
WO2021036059A1 (en) | Image conversion model training method, heterogeneous face recognition method, device and apparatus | |
CN111325851B (en) | Image processing method and device, electronic equipment and computer readable storage medium | |
CN109754464B (en) | Method and apparatus for generating information | |
CN110490959B (en) | Three-dimensional image processing method and device, virtual image generating method and electronic equipment | |
CA3137297C (en) | Adaptive convolutions in neural networks | |
EP3991140A1 (en) | Portrait editing and synthesis | |
WO2023103600A1 (en) | Expression generation method and apparatus, device, medium and computer program product | |
CN109325996B (en) | Method and device for generating information | |
CN112598780A (en) | Instance object model construction method and device, readable medium and electronic equipment | |
CN113822965A (en) | Image rendering processing method, device and equipment and computer storage medium | |
CN115239861A (en) | Face data enhancement method and device, computer equipment and storage medium | |
CN112562056A (en) | Control method, device, medium and equipment for virtual light in virtual studio | |
CN114049674A (en) | Three-dimensional face reconstruction method, device and storage medium | |
CN114792355A (en) | Virtual image generation method and device, electronic equipment and storage medium | |
CN110717405B (en) | Face feature point positioning method, device, medium and electronic equipment | |
CN113780326A (en) | Image processing method and device, storage medium and electronic equipment | |
CN110827341A (en) | Picture depth estimation method and device and storage medium | |
CN116012913A (en) | Model training method, face key point detection method, medium and device | |
CN115272575B (en) | Image generation method and device, storage medium and electronic equipment | |
WO2024077791A1 (en) | Video generation method and apparatus, device, and computer readable storage medium | |
CN117934488A (en) | Construction and optimization method of three-dimensional shape segmentation framework based on semi-supervision and electronic equipment | |
CN112085842A (en) | Depth value determination method and device, electronic equipment and storage medium | |
CN116597293A (en) | Multi-mode scene recognition method, device, computer equipment and storage medium | |
CN117079313A (en) | Image processing method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |