CN111368123B

CN111368123B - Three-dimensional model sketch retrieval method based on cross-modal guide network

Info

Publication number: CN111368123B
Application number: CN202010097592.9A
Authority: CN
Inventors: 梁爽; 戴伟东
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2020-02-17
Filing date: 2020-02-17
Publication date: 2022-06-28
Anticipated expiration: 2040-02-17
Also published as: CN111368123A

Abstract

The invention relates to a three-dimensional model sketch retrieval method based on a cross-mode wizard network, which comprises the following steps: s1: acquiring three-dimensional model training data and sketch training data; s2: training a three-dimensional model network, and learning by using the trained three-dimensional model network to obtain a three-dimensional model characteristic space; s3: training a sketch network by taking the three-dimensional model feature space as a target space to obtain a trained sketch network; s4: the method comprises the steps of utilizing the three-dimensional model network and the sketch network which are trained to extract the three-dimensional model characteristics to be retrieved and the query sketch characteristics to retrieve and obtain the three-dimensional model for corresponding application.

Description

Three-dimensional model sketch retrieval method based on cross-modal guide network

Technical Field

The invention relates to the field of three-dimensional model retrieval based on sketches, in particular to a three-dimensional model sketches retrieval method based on a cross-mode guide network.

Background

Compared with a two-dimensional image, the three-dimensional model has richer information and can comprehensively reflect objective reality, so that the three-dimensional model is widely applied to various fields such as buildings, medical treatment and the like. In recent years, as three-dimensional scanning, three-dimensional printing, and three-dimensional reconstruction techniques have matured, the number of three-dimensional models has also grown rapidly. How to effectively retrieve these three-dimensional models from a three-dimensional model library becomes a major concern. Early search methods were mainly based on keyword search and on three-dimensional model instance search. On one hand, the method has the defects that a large amount of text labels need to be carried out on a three-dimensional model library in advance, and the method is time-consuming and labor-consuming; another aspect is that keywords can hardly describe the query needs of people intuitively. Retrieval based on an example three-dimensional model is straightforward, but in practice it is difficult to implement because people rarely obtain a three-dimensional model as input to a query. In recent years, hand-drawn sketches have become a more popular way of human-computer interaction. Compared with a three-dimensional model, a hand-drawn sketch is very convenient to obtain; compared with keywords, the hand-drawn sketch can express the requirements of people more intuitively. Therefore, the search of three-dimensional models based on sketches is a research direction which is receiving much attention in the field of computer vision.

Early methods based on three-dimensional model sketch retrieval were mainly based on artificially designed features. The methods respectively design corresponding manual features for the sketch and the three-dimensional model, and then directly measure the similarity between the cross-modal features. For example, the method proposed by Eitz et al based on Gabor local linear feature GALIF (Gabor local line based feature); the Cross-Domain Manifold Ranking (CDMR) method proposed by Furuya et al.

In recent years, with the great success of deep learning in the field of computer vision, a variety of deep learning methods are applied to sketch-based three-dimensional model retrieval. Most of the deep learning methods are based on feature extraction of heterogeneous twin convolutional neural networks, namely, deep feature extraction is respectively carried out on a sketch and a three-dimensional model by using two networks, and then matching and metric learning are carried out on features of two modes by using a shared loss function. These methods fall into two broad categories from the point of view of the loss function: one is that the task of three-dimensional model sketch retrieval is regarded as measurement learning, for a sketch, a plurality of positive and negative sample pairs are constructed through the sketch and a three-dimensional model, then a measurement loss function is utilized to optimize a network, so that the positive sample pairs are close to each other, the negative sample pairs are far away from each other, and finally cross-modal features are aligned; representative examples of such methods include the Siamese method proposed by Wang et al, and the Deep Correlated Metric Learning (DCML) method proposed by Dai et al. The other type is to fully utilize the category information, regard the task as a classification task, input the characteristics output by the sketch network and the three-dimensional model network into a shared classifier, and then utilize a classification loss function with discriminant to simultaneously optimize the two networks, so that the sketch and the three-dimensional model of the same category are aggregated, and the sketch and the three-dimensional model of different categories are separated as much as possible; representative examples of such methods are the triple-Center loss function (TCL) method proposed by He et al, the "point-to-subspace" method proposed by Lei et al, and the like. Because the deep neural network can learn deeper features, the deep learning method makes great progress in the performance of three-dimensional model sketch retrieval.

However, in these deep learning methods, two neural networks are used to extract features of two modalities at the same time, and then the extracted features of the two modalities are directly mapped into a common subspace. Therefore, the method for directly mapping the cross-modal characteristics is difficult to effectively reduce the cross-modal difference between the sketch and the three-dimensional model, and further influences the performance of cross-modal retrieval.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide a cross-modal-state-oriented-network-based three-dimensional model sketch retrieval method which effectively overcomes the problem of cross-modal-state difference.

The purpose of the invention can be realized by the following technical scheme:

a three-dimensional model sketch retrieval method based on a cross-modal guide network comprises the following steps:

s1: acquiring three-dimensional model training data and sketch training data;

s2: training a three-dimensional model network, and learning by using the trained three-dimensional model network to obtain a three-dimensional model characteristic space;

s3: training a sketch network by taking the three-dimensional model feature space as a target space to obtain a trained sketch network;

s4: and retrieving to-be-retrieved three-dimensional model features and query sketch features extracted by the trained three-dimensional model network and sketch network to obtain the three-dimensional model for corresponding application.

Further, the step S2 specifically includes:

s21: constructing a three-dimensional model network;

s22: using a classification loss function L_AM-SInputting the three-dimensional model training data into a three-dimensional model network for training to obtain a trained three-dimensional model network;

s23: inputting the three-dimensional model training data into a trained three-dimensional model network, and learning to obtain three-dimensional model characteristics of all three-dimensional model training data and a classified three-dimensional model characteristic space;

s24: and calculating the class center of each class of three-dimensional model features in the three-dimensional model feature space according to the class information.

Further preferably, said classification loss function L_AM-SThe method is an AM-softmax classification loss function, and the expression of the function is as follows:

wherein f is_kTo enter the three-dimensional model features of the classifier,

the weight of the classifier is obtained, n is a boundary coefficient, and s is a scaling coefficient after the weight and the three-dimensional model feature are normalized.

Further, the step S3 specifically includes:

s31: constructing a sketch network;

s32: constructing a guide loss function L by using class center and class information of three-dimensional model features_G；

S33: using a guided loss function L_GAnd inputting the sketch training data into a sketch network for training to obtain the trained sketch network.

The guide loss function L_GAnd constraining the sketch features extracted from the sketch network into a three-dimensional model feature space, and aligning the sketch features with the same category information with the three-dimensional model features.

Further, the guide loss function L_GThe expression of (a) is:

L_G＝L_c-λL_a

wherein L is_cCosine distance, L, of class center for sketch features and three-dimensional model features of the same class_aIs the sum of cosine distances of class centers of sketch features and three-dimensional model features of different classes, lambda is a hyper-parameter and takes a value of 0.01, m is the size of one batch of input data during sketch network training, f_iIs the sketch feature of the ith sketch, yi is the category of the ith sketch feature, c is the category center of the three-dimensional model feature, c_yiClass center of three-dimensional model feature same as class of ith sketch feature, c_jAnd N is the total number of the three-dimensional model feature categories.

Preferably, the three-dimensional model network comprises a first deep convolutional neural network CNN₁And a first full connection layer FC₁The sketch network comprises a second deep convolutional neural network CNN₂And a second full connection layer FC₂。

Preferably, the three-dimensional model training data includes two-dimensional view maps corresponding to all three-dimensional models in the three-dimensional model data set, and the size of the sketch training data is the same as that of the two-dimensional view map of the three-dimensional model.

The first deep convolutional neural network CNN₁The first full-connection layer FC is used for extracting the features of each two-dimensional view map and performing feature fusion₁Outputting three-dimensional model features, the second full-connected layer FC₂And outputting the sketch characteristics.

Further, the step S4 specifically includes:

s41: rendering all three-dimensional models to be retrieved into a two-dimensional view map;

s42: inputting a two-dimensional view map of a three-dimensional model to be retrieved into a three-dimensional model network, and extracting characteristics of the three-dimensional model to be retrieved; inputting the query sketch into a sketch network, and extracting the characteristics of the query sketch;

s43: calculating the cosine distance between the query sketch features and the three-dimensional model features to be retrieved, and sequencing;

s44: and sequentially outputting the three-dimensional model corresponding to each distance according to the sequencing result to finish the three-dimensional model retrieval.

Compared with the prior art, the invention has the following advantages:

1) according to the invention, the cross-modal feature space is indirectly learned, the three-dimensional model feature space with strong discriminability is trained in advance by utilizing the characteristic of abundant features of the three-dimensional model, and then the feature space is utilized to guide the training of a sketch network, and the sketch features are transferred to the three-dimensional model feature space, so that the cross-modal difference between the sketch and the three-dimensional model is effectively reduced;

2) The guide loss function of the present invention comprises two parts, where L_cSo that the sketch features and the three-dimensional model features with the same class information are gathered as much as possible, L_aThe sketch features and the three-dimensional model features of different classes are separated as much as possible, so the sketch feature and the three-dimensional model features with the same class information are aligned better by using the guide loss function to constrain sketch network training, and finally, the three-dimensional model retrieval precision based on the sketch is effectively improved.

Drawings

FIG. 1 is a schematic work flow diagram of the overall framework of the present invention;

FIG. 2 is a flow chart of a method provided in an embodiment;

FIG. 3 is a guide loss function L_GSchematic diagram of (a);

FIG. 4 is a PR plot of the method of the present invention and other methods on a SHREC 2013 data set;

fig. 5 is a PR plot of the method of the present invention and other methods on the SHREC 2014 data set.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.

Examples

As shown in fig. 2, the method for retrieving a three-dimensional model sketch based on a cross-modal wizard network provided by the present invention mainly comprises the following five steps:

1) rendering each three-dimensional model in the three-dimensional model data set into a plurality of two-dimensional view maps;

2) inputting a plurality of two-dimensional view maps into a three-dimensional model network, and representing the three-dimensional model characteristics by using the two-dimensional view map characteristics obtained by training;

3) guiding the training of the sketch network by using the three-dimensional model characteristics obtained in the step 2), and learning the sketch characteristics into a three-dimensional model characteristic space;

4) for the three-dimensional model library to be retrieved, extracting the characteristics of the three-dimensional model to be retrieved by using the three-dimensional model network trained in the step 2), and for the query sketch, extracting the characteristics of the sketch by using the sketch network trained in the step 3);

5) and calculating the cosine distance between the characteristics of the query sketch and the characteristics of each three-dimensional model to be retrieved in the three-dimensional model library, and sequencing to complete the three-dimensional model retrieval based on the sketch.

As shown in fig. 1, a schematic workflow of the present invention is shown. The present method will be described in detail in the following sections of the specification as well.

The specific method of the step 1) comprises the following steps:

firstly, uniformly placing 12 virtual cameras around a three-dimensional model, namely placing one virtual camera around the three-dimensional model at intervals of 30 degrees, then rendering the three-dimensional model into a two-dimensional view angle image by each virtual camera from different view angles, and finally obtaining 12 two-dimensional view angle images by each three-dimensional model.

The specific method of the step 2) comprises the following steps:

21) the size of the two-dimensional view map of the three-dimensional model is unified into 224 × 224 and normalized to be between 0 and 1. Then, sequentially inputting the 12 two-dimensional view maps of each model into a first deep Convolutional Neural Network (CNN)₁And obtaining the characteristics of 12 two-dimensional view maps.

22) At a first deep convolutional neural network CNN₁The 12 two-dimensional view map features are feature-fused by using an average pooling (Mean-posing) layer at the tail end of the first full-connection layer FC to obtain a single feature, and the feature after fusion is input into the first full-connection layer FC₁And further feature extraction is carried out.

23) The first full connection layer FC₁The output characteristics are sent to a classifier, and the whole three-dimensional model network (marked as CNN) is optimized by using an AM-softmax classification loss function₁-FC₁) And obtaining the trained three-dimensional model network.

Wherein the first deep convolutional neural network CNN₁Supporting arbitrary forms of convolutional networks, first full link layer FC₁Any form of fully connected network is also supported.

The AM-softmax classification loss function is an improvement of softmax, and can reduce cosine distances among data of the same category and increase cosine distances among different categories, so that the learned features are more distinctive. The expression is as follows:

Wherein f is_kIs the three-dimensional model feature entered into the classifier;

is the weight of the classifier; n is a boundary coefficient for constraining the cosine distance intervals between different classes; and s is a scaling factor after the weight and the three-dimensional model feature are normalized so as to facilitate the convergence of the three-dimensional model network training.

24) Inputting the three-dimensional model training data into the three-dimensional model network trained in the step 23) again, and extracting the first full connecting layer FC₁Obtaining three-dimensional model characteristics of three-dimensional model training data and a three-dimensional model characteristic space with separable categories by the output three-dimensional model characteristics, calculating the category centers of the three-dimensional model characteristics according to category information, and recording the category centers as a set C ═ C { (C {)₁,c₂,c₃,…,c_NWhere N represents the number of categories of the data set.

The specific method of the step 3) is as follows:

31) unifying the sketch to 224 × 224 size and normalizing to 0-1, and inputting the sketch into a second deep convolutional neural network CNN₂And a second full connection layer FC₂Here, the sketch network (denoted as CNN)₂-FC₂) And the three-dimensional model network CNN in step 2)₁-FC₁Are identical but the parameters are different.

32) Taking the three-dimensional model feature space learned in advance in the step 2) as a target space, and constructing a guide loss function and recording the guide loss function as L by utilizing the category center and the category information of the three-dimensional model feature learned in advance _GGuide loss function L_GWith the purpose of making the second fully-connected layer FC as shown in fig. 3₂The output sketch features are transferred into a three-dimensional model feature space to reduce cross-modal feature difference, and the sketch features and the three-dimensional model features with the same category information are gathered together to realize cross-modal feature alignment and guide loss function L_GThe calculation formula of (a) is as follows:

L_G＝L_c-λL_a

wherein L is_cIndicating a second full connection layer FC₂The cosine distances of the output sketch features and the centers of the three-dimensional model classes of the same class are combined with the graph 3, and the function of the output sketch features and the three-dimensional model features with the same class information is to enable the sketch features and the three-dimensional model features with the same class information to be gathered as much as possible; l is_aIndicating a second full connection layer FC₂The sum of cosine distances of the output sketch features and centers of other different classes is combined with the graph 3, the function of the sum is to separate the sketch features from three-dimensional model features of different classes as far as possible, lambda is a hyper-parameter and is used for balancing the weight of the sketch features and the three-dimensional model features of different classes, and the value of lambda is set to be 0.01.

In particular, L_cAnd L_aThe formula of (1) is as follows:

wherein m represents the size of a batch of batch input into the sketch network during training; f. of_iA sketch feature representing the ith sketch; yi represents category information of the ith sketch feature; c represents the class center of the three-dimensional model learned in step 2), with the subscript representing the center of which class it belongs to; n represents the total number of categories.

33) On-guide loss function L_GUnder the constraint of (2), training the sketch network CNN₂-FC₂Second full connection layer FC₂The output sketch depth features are constrained in a three-dimensional model feature space, and simultaneously, sketch features and three-dimensional model features of the same category are aligned, so that a cross-modal feature aligned shared feature space is finally obtained, and the feature space has small cross-modal differences and effective cross-modal feature alignment, so that the purposes of reducing the cross-modal feature differences and aligning the cross-modal features are achieved.

The specific method of the step 4) comprises the following steps:

for a three-dimensional model library to be retrieved, rendering all three-dimensional models into a two-dimensional view map by utilizing the step 1), and then utilizing the three-dimensional model network CNN trained in the step 2)₁-FC₁And extracting three-dimensional model characteristics from all the three-dimensional models to obtain a three-dimensional model characteristic library. For any query sketch, utilizing the sketch network CNN trained in the step 3)₂-FC₂And extracting sketch features of the query sketch to obtain query sketch features.

The specific method of the step 5) comprises the following steps:

calculating cosine distances between the query sketch features obtained in the step 4) and each model feature in the three-model feature library, sequencing the cosine distances from small to large, outputting a three-dimensional model corresponding to each distance according to a sequencing result, and completing three-dimensional model retrieval based on the sketch.

To support and verify the performance of the sketch-based three-dimensional model retrieval method proposed by the present invention, our method is applied here on two widely used public standard datasets. The two datasets are the SHREC2013 dataset and the SHREC2014 dataset. The SHREC2013 data set has 1258 three-dimensional models in total, the models are divided into 90 categories, the number of samples in each category is unbalanced, and the average number of the three-dimensional models in each category is 14. The sketch set of the data set comprises 7200 hand-drawn sketches which are divided into 90 categories corresponding to the three-dimensional models. The sketch for each class is 80 samples, including 50 training samples and 30 test samples. The SHREC2014 is an extension of the SHREC2013 data set, and has more categories and larger scale. The dataset has a total of 8987 three-dimensional models, divided into 171 categories. The sketch set has 13680 freehand sketches, 171 categories, and as with the SHREC2013 dataset, the sketch for each category is 80 samples, including 50 training samples and 30 test samples. The data set is large in scale, more in types, more unbalanced in sample distribution and larger in-class difference of each type, so that the retrieval difficulty is higher, and the algorithm weighing performance is better.

Experiments are carried out on an SHREC 2013 data set and an SHREC 2014 data set, seven general indexes of Precision-Recall curve (Precision-Recall curve, PR curve for short), Nearest Neighbor accuracy (Nearest Neighbor, NN for short), First-level matching accuracy (First Tier, FT for short), Second-level matching accuracy (Second Tier, ST for short), E-Measure (E-Measure, E for short), broken loss Cumulative Gain (DCG for short) and Average accuracy mean (mean Average prediction, mAP for short) are taken as measuring standards, and the method is compared with other latest and foremost sketch-based three-dimensional model retrieval methods. For fair comparison, we use the same basic model as other methods, namely CNN₁And CNN₂Experiments were carried out using AlexNet, VGG16, VGG19, and ResNet-50 structures, respectively. Search results and comparison on the SHREC 2013 dataset:

fig. 4 shows PR plots of the method of the present invention and other methods on the SHREC 2013 data set. As can be seen in the figure, the retrieval performance of the method is obviously superior to that of other methods. Table 1 also gives the data for the present method and other methods on the SHREC 2013 dataset under six evaluation criteria, and our method is overall superior to other leading edge methods on six criteria using the same underlying model.

Table 1 retrieval performance (%) comparison on SHREC 2013 dataset

Search results and comparisons on the SHREC 2014 dataset:

the SHREC 2014 data set is a larger and more difficult data set, and the retrieval performance of the method is also higher. Fig. 5 shows a PR plot of the method of the present invention on a SHREC 2014 data set versus other methods. It can be seen that the retrieval performance of the method is still better than that of other methods on more difficult data sets. Table 2 gives the data for the present method and other methods at six evaluation indices on the SHREC 2014 data set, and our method is still overall superior to other leading edge methods at six indices. This demonstrates that the methods herein can achieve superior search performance on difficult datasets as well.

Table 2 retrieve performance (%) comparison on SHREC 2014 data set

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and those skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A three-dimensional model sketch retrieval method based on a cross-mode wizard network is characterized by comprising the following steps:

step S1: acquiring three-dimensional model training data and sketch training data;

step S2: training a three-dimensional model network, and learning by using the trained three-dimensional model network to obtain a three-dimensional model feature space, wherein the step S2 specifically includes:

s21: constructing a three-dimensional model network;

s24: calculating the category center of each category of three-dimensional model features in the three-dimensional model feature space according to the category information;

step S3: training a sketch network by taking the three-dimensional model feature space as a target space to obtain the trained sketch network, wherein the step S3 specifically comprises the following steps:

s31: constructing a sketch network;

s32: constructing a guide loss function L by using class center and class information of three-dimensional model features _G；

S33: using a guided loss function L_GInputting the sketch training data into a sketch network for training to obtain a trained sketch network;

step S4: and extracting the characteristics of the three-dimensional model to be retrieved and the characteristics of the query sketch by using the trained three-dimensional model network and the trained sketch network, and retrieving to obtain the three-dimensional model for corresponding application.

2. The three-dimensional model of claim 1 based on cross-mode wizard networkThe sketch retrieval method is characterized in that the guide loss function L_GAnd constraining the sketch features extracted from the sketch network into a three-dimensional model feature space, and aligning the sketch features with the same category information with the three-dimensional model features.

3. The method as claimed in claim 2, wherein the wizard loss function L is a function of loss_GThe expression of (a) is:

L_G＝L_c-λL_a

wherein L is_cCosine distance, L, of class center for sketch features and three-dimensional model features of the same class_aIs the sum of cosine distances of class centers of sketch features and three-dimensional model features of different classes, lambda is a hyper-parameter and takes a value of 0.01, m is the size of one batch of input data during sketch network training, f _iIs the sketch feature of the ith sketch, yi is the category of the ith sketch feature, c is the category center of the three-dimensional model feature, c_yiClass center of three-dimensional model feature same as class of ith sketch feature, c_jAnd N is the total number of the three-dimensional model feature categories.

4. The method as claimed in claim 1, wherein the classification loss function L is a function of a model loss_AM-SThe method is an AM-softmax classification loss function, and the expression of the function is as follows:

wherein f is_kTo enter the three-dimensional model features of the classifier,

5. The method as claimed in claim 1, wherein the three-dimensional model network comprises a first deep Convolutional Neural Network (CNN)₁And a first full connection layer FC₁The sketch network comprises a second deep convolutional neural network CNN₂And a second full connection layer FC₂。

6. The method as claimed in claim 5, wherein the three-dimensional model training data includes two-dimensional view maps corresponding to all three-dimensional models in the three-dimensional model data set, and the size of the three-dimensional model training data is the same as the size of the two-dimensional view maps of the three-dimensional models.

7. The method as claimed in claim 6, wherein the first deep convolutional neural network CNN is a deep convolutional neural network₁The first full-connection layer FC is used for extracting the characteristics of each two-dimensional view map and carrying out characteristic fusion₁Outputting three-dimensional model features, the second full connection layer FC₂And outputting the sketch characteristics.

8. The method for retrieving the three-dimensional model sketch based on the cross-modal guidance network as claimed in claim 7, wherein said step S4 specifically comprises:

s43: calculating cosine distances between the query sketch features and all three-dimensional model features to be retrieved, and sequencing;