CN105243139B

CN105243139B - A kind of method for searching three-dimension model and its retrieval device based on deep learning

Info

Publication number: CN105243139B
Application number: CN201510651898.3A
Authority: CN
Inventors: 刘安安; 曹群; 聂为之; 苏育挺
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2015-10-10
Filing date: 2015-10-10
Publication date: 2018-10-23
Anticipated expiration: 2035-10-10
Also published as: CN105243139A

Abstract

The invention discloses a kind of method for searching three-dimension model based on deep learning and its retrieval device, search method to include：Obtain one group of two dimension view collection of threedimensional model；One group of filter template for being used for convolutional neural networks of training；Convolution is carried out to each width view that two dimension view is concentrated by the filter template after training, forms view primary features；Using view primary features as the input of multiple recurrent neural networks, study obtains view advanced features；View is indicated with view advanced features, calculates the similarity between the view of different threedimensional models, and then calculate the similarity between different threedimensional models, descending arranges to obtain final retrieval result.Retrieving device includes：Acquisition module, training module, convolution module, study module, computing module and sorting module.The advanced features that the present invention is learnt by convolutional neural networks and recurrent neural network can characterize data immanent structure rule well, improve the accuracy and robustness of three-dimensional model search.

Description

A kind of method for searching three-dimension model and its retrieval device based on deep learning

Technical field

The present invention relates to computer vision, nerual network technique and three-dimensional model search fields more particularly to one kind to be based on The method for searching three-dimension model and its retrieval device of deep learning.

Background technology

With increasingly mature, the fast development of computer hardware technique and internet of dimensional Modeling Technology, internet Increase with showing blasting type with the threedimensional model quantity in professional domain database.Threedimensional model can be carried than text and figure As more information, also more horn of plenty is true for display form, it is promoted to be applied to each of social production life more and more widely A field^[1].There is an urgent need to effective and efficient threedimensional models in current diversified threedimensional model big data, field Searching algorithm, the field correlative study have become research field very active in recent years.

Three-dimensional model searching algorithm is broadly divided into two major classes：Based on model and it is based on view.The work of early stage is mainly concentrated In the search method based on model, including the feature based of low level and high-level based on structure.Low level based on The method of feature utilizes geometric moment^[2], surface distribution^[3], volume information and morphology^[4]To describe threedimensional model.It is high-rise Secondary structure-based method requires each 3D models that must have specific space and structural information, and which has limited based on model side The practical application of method.Compared to the method based on model, virtual threedimensional model is not strictly required for the method based on view, so It is more flexible.In addition, the retrieval of two dimensional image has had been vigorously developed decades, abundant technical foundation has been had accumulated, Method based on view can effectively be retrieved using existing technology^[5]。

In most of 3D model indexs algorithms, what is used is all designed feature, such as：Based on direction histogram SIFT (Scale invariant features transform), SURF (accelerate robust features), and these features there are limitations, such as SIFT can only be right Gray-scale map extracts, and may not apply to this multi-modal fields RGB-D (three primary colors add distance)^[6].In recent years, deep learning Algorithm is very popular in feature learning field, it can learn to arrive the very strong feature of robustness, be used for table by the method for iteration Show different objects^[7]。

Invention content

The present invention provides a kind of method for searching three-dimension model based on deep learning and its retrieval device, the present invention to utilize Depth learning technology, the feature learnt can characterize data immanent structure rule well, improve three-dimensional model search Accuracy and robustness, it is described below：

A kind of method for searching three-dimension model based on deep learning, the method for searching three-dimension model include the following steps：

Obtain one group of two dimension view collection of threedimensional model；One group of filter template for being used for convolutional neural networks of training；

Convolution is carried out to each width view that two dimension view is concentrated by the filter template after training, it is primary to form view Feature；

Using view primary features as the input of multiple recurrent neural networks, study obtains view advanced features；

View is indicated with view advanced features, calculates the similarity between the view of different threedimensional models, and then calculate not With the similarity between threedimensional model, descending arranges to obtain final retrieval result.

Wherein, the filter template by after training carries out convolution, shape to each width view that two dimension view is concentrated It is specially at the step of view primary features：

Pond dimension-reduction treatment is carried out to the feature after convolution, selects the region of particular size as pond region, Jiang Chihua Convolution feature afterwards is as view primary features.

Wherein, described to indicate view with view advanced features, calculate the similarity between the view of different threedimensional models Step is specially：

Wherein, v, w are the arbitrary two width different views of any two difference threedimensional model；It is Feature Mapping function；It is the character representation of view v, w respectively；P (v, w) is view v, the similarity between w.

Wherein, the step of similarity calculated between different threedimensional models is specially：

Wherein, Q, M are two different threedimensional models；N, m are Q, the number of views in M respectively；P(v_i,w_j) it is i-th in Q Similarity in width view and M between jth width view；S (Q, M) is threedimensional model Q, the similarity between M.

A kind of three-dimensional model search device based on deep learning, the three-dimensional model search device include：

Acquisition module, one group of two dimension view collection for obtaining threedimensional model；

Training module, for training one group of filter template for convolutional neural networks；

Convolution module, for being rolled up to each width view that two dimension view is concentrated by the filter template after training Product forms view primary features；

Study module, for using view primary features as the input of multiple recurrent neural networks, study to obtain view height Grade feature；

Computing module calculates similar between the view of different threedimensional models for indicating view with view advanced features Degree, and then calculate the similarity between different threedimensional models；

Sorting module arranges to obtain final retrieval result for descending.

Wherein, the convolution module includes：

Submodule is handled, for carrying out pond dimension-reduction treatment to the feature after convolution；

Select submodule, for selecting the region of particular size as pond region, using the convolution feature of Chi Huahou as View primary features.

The advantageous effect of technical solution provided by the invention is：Son extraction point-of-interest is described with SIFT, and then is extracted Interested piece, the redundancy of sub-block information in object view can be greatly lowered, while improving each interested piece to have Effect property.The advanced features that the present invention is learnt by convolutional neural networks and recurrent neural network can be well in characterize data In structure law, the accuracy and robustness of three-dimensional model search are improved.

Description of the drawings

Fig. 1 is the flow chart of the method for searching three-dimension model based on deep learning；

Fig. 2 is that the quasi- full curve of looking into of looking into of three kinds of features in ETH (Zurich polytechnical university of federation) database compares signal Figure；

Fig. 3 is that looking into for three kinds of methods quasi- looks into full curve comparison schematic diagram in ETH databases；

Fig. 4 is the structural schematic diagram of the three-dimensional model search device based on deep learning；

Fig. 5 is the schematic diagram of convolution module.

In attached drawing, the list of parts representated by each component is as follows：

1：Acquisition module； 2：Training module；

3：Convolution module； 4：Study module；

5：Computing module； 6：Sorting module；

31：Handle submodule； 32：Select submodule.

Specific implementation mode

To make the object, technical solutions and advantages of the present invention clearer, embodiment of the present invention is made below further It is described in detail on ground.

Embodiment 1

An embodiment of the present invention provides a kind of method for searching three-dimension model based on deep learning, referring to Fig. 1, this method packet Include following steps：

101：Obtain one group of two dimension view collection of threedimensional model；One group of filter mould for being used for convolutional neural networks of training Plate；

102：Convolution is carried out to each width view that two dimension view is concentrated by the filter template after training, forms view Primary features；

103：Using view primary features as the input of multiple recurrent neural networks, study obtains view advanced features；

104：View is indicated with view advanced features, calculates the similarity between the view of different threedimensional models, Jin Erji The similarity between different threedimensional models is calculated, descending arranges to obtain final retrieval result.

In conclusion the embodiment of the present invention through the above steps 101- steps 104 realize through convolutional neural networks and The advanced features that recurrent neural network learns can characterize data immanent structure rule well, improve three-dimensional model search Accuracy and robustness.

Embodiment 2

The scheme in embodiment 1 is described in detail with reference to specific calculation formula, example, it is as detailed below：

201：Obtain one group of two dimension view collection of threedimensional model；

Wherein, the view-set of a threedimensional model refers to the corresponding one group of two dimension view for indicating the threedimensional model, can The threedimensional model is indicated with position from different directions.These two dimension views can shoot true threedimensional model by video camera It obtains.The specific operation process of the step is known to those skilled in the art, and the embodiment of the present invention does not repeat this.

202：One group of filter template for being used for convolutional neural networks of training；

When specific implementation, it may be used all interested in one width view of feature point description acquisition currently popular Point, without loss of generality, the embodiment of the present invention use widely used Scale invariant features transform^[8](Scale- Invariant feature transform, abbreviation SIFT) description.The SIFT feature of SIFT description son extractions is image Local feature maintains the invariance to rotation, scaling, brightness change；Visual angle change, affine transformation, noise are also kept A degree of stability.All views in all threedimensional models are extracted with the operation of SIFT point-of-interests, it is emerging to establish sense Interesting point set.

The pixel centered on each point-of-interest respectively, it is left and right to central pixel point, upper and lower respectively to extend 4 pixels Unit obtains interested piece that size is 9 × 9, to establish interested piece of collection.

Normalization to interested piece of all progress brightness and contrasts.Concrete operations are interested piece to each, Subtract its average value and divided by standard deviation.According to formula：

Interested piece after being normalized.Wherein, x⁽ⁱ⁾Be normalization after i-th interested piece；It is non-normalizing Interested piece of i-th changed；Mean representatives are averaged；Standard deviation is sought in var representatives.

To interested piece of progress whitening operation after normalization, the correlation between data is removed.Without loss of generality, originally Inventive embodiments use ZCA (improved principal component analysis) albefaction^[9]Method.According to formula：

cov(x⁽ⁱ⁾)=VDV^-1

Its covariance matrix is sought to interested piece, and does Eigenvalues Decomposition.Wherein, covariance is sought in cov representatives；D is diagonal Matrix, diagonal entry are characterized value；V is feature vector corresponding with D；V^-1It is the inverse matrix of V.

According to formula

Obtain interested piece after albefaction.Wherein,Be after albefaction i-th interested piece；ε_zcaIt is a very little Constant selects ε in the present embodiment_zca=0.1.

All interested pieces are clustered with clustering algorithm currently popular, without loss of generality, the present invention is implemented Example uses classical K-means^[10]Clustering method.K-means clustering methods are specially：Determination to be clustered accurate first Number K, K interested piece of initial selected are used as cluster centre, to remaining interested piece each, according to itself and each cluster centre Distance, it is assigned to nearest class.The average value for recalculating interested piece in each class forms new cluster centre.Weight This process is carried out again, until clustering convergence.Each cluster centre is exactly a filter template, all cluster centres Set constitutes one group of K filter template.K=128 in the present embodiment.

203：Convolution is carried out to each width view in step 201 using the filter template after training, at the beginning of forming view Grade feature；

Identical size (d is zoomed to each width view in step 201_i×d_i, wherein d_iIt is scaled for each width view The length of side afterwards), with trained one group of filter template (K × d in step 202_f×d_f, wherein d_fFor each filter mould The length of side of plate) scale in step 201 of deconvoluting after each width view, convolution step-length is s_f, the feature sizes after convolution areD is chosen in the embodiment of the present invention_i=148, d_f=9, s_f=1, therefore the spy after convolution It is 128 × 140 × 140 to levy size.

Pond dimension-reduction treatment is carried out to the feature after convolution.Select the region of particular size as pond region, calculating should The average value of convolution feature on region is as new characteristic value.If the feature sizes after convolution are K × r_c×r_c(wherein r_cFor The length of side of certain one-dimensional characteristic after convolution), pond block size is d_q×d_q(wherein d_qFor the length of side of pond block), pond step-length is s_q, Then the feature sizes behind pond are thenD is chosen in the embodiment of the present invention_q=10, s_q= 5, r_c=140, therefore the convolution after most terminal cistern is characterized as 128 × 27 × 27, this is the primary features of view.

204：Using the view primary features obtained in step 203 as the input of multiple recurrent neural networks, study obtains View advanced features；

By taking single recurrent neural network as an example, the view primary features piecemeal that will be obtained in step 203, if each width view Primary features be K × r_p×r_p(wherein r_pFor the length of side that view primary features are one-dimensional), each piece of size is K × d_b×d_b (wherein d_bFor each piece after piecemeal certain one-dimensional length of side), it is divided intoA block, the bottom as single recurrent neural network Layer input.According to formula

The output of each block is acquired respectively, and as next layer of input.Wherein, f is activation primitive, is usually selected Tanh functions or sigmoid functions；The weight matrix that W is randomly generated；It is in i-th piece View primary features.Obtaining size isI-th of first layer output

By the output of first layerWhereinAs next layer of input, Repeat the above process, ultimately generates the view advanced features of K dimensions.

N number of recurrent neural network will generate the view advanced features of N number of K dimensions, these view advanced features are in series The view advanced features of NK dimensions.

In the embodiment of the present invention, r_p=27, d_b=3, N=64, f are sigmoid functions.Formula is as follows

Wherein, x is input feature vector；Y is the output feature after sigmoid functional transformations.Therefore finally learn View advanced features are tieed up for 64 × 128=8192.

205：View is indicated using the view advanced features obtained in step 204, calculates the view of different threedimensional models Between similarity, and then calculate the similarity between different threedimensional models, descending arranges to obtain final retrieval result；

(1) according to formula

Calculate the similarity between the view of different threedimensional models.Wherein, v, w are the arbitrary of any two difference threedimensional model Two width different views；It is Feature Mapping function,It is the character representation of view v, w respectively；P (v, w) is to regard Scheme v, the similarity between w.In the embodiment of the present invention, the character representation of view v, wIt is arrived for study in step 204 View advanced features.

(2) according to formula

Calculate the similarity between different threedimensional models.Wherein, Q, M are two different threedimensional models；N, m are Q, M respectively In number of views；P(v_i,w_j) it is similarity in Q in the i-th width view and M between jth width view；S (Q, M) is threedimensional model Q, Similarity between M.It is sorted from big to small according to similarity, the retrieval ordering result of required threedimensional model can be obtained.

In conclusion the embodiment of the present invention through the above steps 201- steps 205 realize through convolutional neural networks and The advanced features that recurrent neural network learns can characterize data immanent structure rule well, improve three-dimensional model search Accuracy and robustness.

Embodiment 3

The scheme in Examples 1 and 2 is carried out with reference to specific Fig. 2 and Fig. 3, calculation formula and experimental data feasible Property verification, it is described below：

1) experimental data base

Database used in experiment is ETH databases shared on the net, and 80 threedimensional models, including 8 are shared in the database Class, per 10 objects of class.It is apple, car, milk cow, cup, doggie, horse, pears, tomato respectively.

2) evaluation criteria

The evaluation criteria applied in experiment is to look into quasi- to look into full curve (Precision-Recall)：Three-dimensional model search Performance Evaluation in terms of recall level average (Average Recall, abbreviation AR) and Average Accuracy (Average Precision, abbreviation AP).

AR and AP are acquired according to following formula, makes looking into and quasi- looks into full curve：

Wherein, Recall is recall ratio；N_zIt is the quantity of correct retrieval object；N_rIt is the quantity of all related objects.

Wherein, Precision is precision ratio；N_allIt is the quantity of all retrieval objects.

Wherein, AR is recall level average；N_mIt is the quantity of threedimensional model class；Recall (i) is the response value of the i-th class.

Wherein, AP is average precision；Precision (i) is the accuracy value of the i-th class.

3) contrast characteristic

Zernike (Zelnick) moment characteristics^[11]：Zernike squares are one of feature descriptors of image, in the flat of view It moves, there is invariance in scaling and rotation, be applied in all kinds of target identifications and model analysis.

HSV (brightness of tone saturation degree) color characteristic^[12]：Several regions are divided the image into according to colouring information, and will Color is divided into multiple sub-segments, and each region carries out color space quantization and establishes color index, and combination forms HSV features.

4) control methods

This method and following two methods are compared in experiment：

Weight bipartite graph^[13]Algorithm：With the representational view structure weighting bipartite graph picked out, pass through improved breast tooth Sharp algorithm calculates similarity.

Hypergraph^[14]Algorithm：Super side is generated by view cluster to calculate on the basis of hypergraph to build multigroup hypergraph Go out the similarity between different threedimensional models.

5) experimental result

Three kinds of features looks into that quasi- to look into full curve more as shown in Figure 2 in ETH databases.Wherein, ordinate represents precision (Precision), abscissa, which represents, responds (Recall).Look into it is quasi- look into full curve and transverse and longitudinal coordinate institute envelope surface product is bigger, represent inspection It without hesitation can be more excellent.Compared with traditional Zernike moment characteristics, hsv color feature, retrieval performance significantly carries this method feature It rises, it was demonstrated that the feature based on deep learning has superiority, can more characterize the immanent structure of image.

Three kinds of methods looks into that quasi- to look into full curve more as shown in Figure 3 in ETH databases.The figure knot outstanding compared to two kinds Structure algorithm, this method retrieval effectiveness based on deep learning is still better than weighting bipartite graph and hypergraph method.

In conclusion the embodiment of the present invention demonstrates implementation by above-mentioned experimentation, experimental data and simulation waveform The feasibility of scheme in example 1 and 2, method for searching three-dimension model provided in an embodiment of the present invention meet a variety of in practical application It needs.

Embodiment 4

A kind of three-dimensional model search device based on deep learning, referring to Fig. 4, three-dimensional model search device includes：

Acquisition module 1, one group of two dimension view collection for obtaining threedimensional model；

Training module 2, for training one group of filter template for convolutional neural networks；

Convolution module 3, for being rolled up to each width view that two dimension view is concentrated by the filter template after training Product forms view primary features；

Study module 4, for using view primary features as the input of multiple recurrent neural networks, study to obtain view height Grade feature；

Computing module 5 calculates similar between the view of different threedimensional models for indicating view with view advanced features Degree, and then calculate the similarity between different threedimensional models；

Sorting module 6 arranges to obtain final retrieval result for descending.

Wherein, referring to Fig. 5, convolution module 3 includes：

Submodule 31 is handled, for carrying out pond dimension-reduction treatment to the feature after convolution；

Submodule 32 is selected, for selecting the region of particular size as pond region, the convolution feature of Chi Huahou is made For view primary features.

The embodiment of the present invention is not limited the executive agent of above-mentioned module, submodule, can be the tools such as microcontroller, pc machines There is the device of computing function, as long as the device of above-mentioned function can be completed,

In conclusion the embodiment of the present invention is realized by above-mentioned module, submodule through convolutional neural networks and recurrence The advanced features that neural network learning arrives can characterize data immanent structure rule well, improve the standard of three-dimensional model search True property and robustness.

To the model of each device in addition to doing specified otherwise, the model of other devices is not limited the embodiment of the present invention, As long as the device of above-mentioned function can be completed.

Bibliography

[1]Zhang Fei's threedimensional models feature extraction and Relevance Feedback Algorithms research and realization [D]Northwest University, 2010.

[2]J.Tangelder and R.Veltkamp,“Polyhedral model retrieval using weighted point sets,”Int.J.Image Graph.,vol.3,no.1,pp.209–229,2003.

[3]R.Osada,T.Funkhouser,B.Chazelle,and D.Dobkin,“Shape distributions,”ACM Trans.Graph.,vol.21,no.4,pp.807–832,2002.

[4]A.E.Johnson and M.Hebert,“Using spin images for efficient object recognition in cluttered 3D scenes,”IEEE Trans.Pattern Anal.Mach.Intell., vol.21,no.5,pp.433–449,May 1999.

[5]P.Daras and A.Axenopoulos,“A 3D shape retrieval framework supporting multimodal queries,”Int.J.Comput.Vis.,vol.89,nos.2–3,pp.229–247, 2010.

[6]CHENG Y.,ZHAO X.,HUANG K.,TAN T.:Semi-supervised learning for RGB- D object recognition.In ICPR(2014),pp.2377–2382.

[7]Feng Wang,Lanfen Lin,and Min Tang.A new sketch-based 3d model retrieval approach by using global and local features.Graphical Models,76(3): 128–139,2014.

[8]Lowe,David G.(1999)."Object recognition from local scale-invariant features".Proceedings of the International Conference on Computer Vision 2.pp.1150–1157.doi:10.1109/ICCV.1999.790410.

[9]Alex Krizhevsky,Learning Multiple Layers of Features from Tiny Images,2009.

[10]Hartigan,J.A.；Wong,M.A.(1979)."Algorithm AS 136:AK-Means Clustering Algorithm".Journal of the Royal Statistical Society,Series C 28 (1):100–108.JSTOR 2346830.

[11]A.Khotanzad and Y.Hong,“Invariant image recognition by Zernike moments,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.12,no.5,pp.489–497,May 90.

[12]Realizations of the Wang Yan woodss based on hsv color characteristic image searching algorithm in Matlab;J]Computer programming skill Ingeniously with maintenance, 2013:86-87.DOI:doi:10.3969/j.issn.1006-4052.2013.16.037.

[13]Y.Gao,Q.Dai,M.Wang,and N.Zhang,“3D model retrieval using weighted bipartite graph matching,”Signal Process.,Image Commun.,vol.26,no.1,pp.39–47, 2011.

[14]Y.Gao,M.Wang,D.Tao,R.Ji,and Q.Dai,“3D object retrieval and recognition with hypergraph analysis,”IEEE Trans.Image Process.,vol.21,no.9, pp.4290–4303,Sep.2012.

It will be appreciated by those skilled in the art that attached drawing is the schematic diagram of a preferred embodiment, the embodiments of the present invention Serial number is for illustration only, can not represent the quality of embodiment.

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.

Claims

1. a kind of method for searching three-dimension model based on deep learning, which is characterized in that the method for searching three-dimension model includes Following steps：

Convolution is carried out to each width view that two dimension view is concentrated by the filter template after training, it is primary special to form view Sign；

Using view primary features as the input of multiple recurrent neural networks, study obtains view advanced features, and the view is high Grade characteristic present data immanent structure rule；

View is indicated with view advanced features, calculates the similarity between the view of different threedimensional models, and then calculate difference three Similarity between dimension module, descending arrange to obtain final retrieval result；

The step of similarity calculated between different threedimensional models is specially：

Wherein, Q, M are two different threedimensional models；N, m are Q, the number of views in M respectively；P(v_i, w_j) it is that the i-th width regards in Q Similarity in figure and M between jth width view；S (Q, M) is threedimensional model Q, the similarity between M.

2. a kind of method for searching three-dimension model based on deep learning according to claim 1, which is characterized in that described logical The step of filter template crossed after training carries out convolution, forms view primary features to each width view that two dimension view is concentrated Specially：

Pond dimension-reduction treatment is carried out to the feature after convolution, selects the region of particular size as pond region, by Chi Huahou's Convolution feature is as view primary features.

3. a kind of method for searching three-dimension model based on deep learning according to claim 1, which is characterized in that the use View advanced features the step of indicating view, calculate the similarity between the view of different threedimensional models are specially：

Wherein, v, w are the arbitrary two width different views of any two difference threedimensional model；It is Feature Mapping function；

It is the character representation of view v, w respectively；P (v, w) is view v, the similarity between w.

4. a kind of three-dimensional model search device based on deep learning, which is characterized in that the three-dimensional model search device includes：

Convolution module, for carrying out convolution, shape to each width view that two dimension view is concentrated by the filter template after training At view primary features；

Study module, for using view primary features as the input of multiple recurrent neural networks, study to obtain the advanced spy of view Sign；

Computing module calculates the similarity between the view of different threedimensional models for indicating view with view advanced features, into And calculate the similarity between different threedimensional models；

Sorting module arranges to obtain final retrieval result for descending；

5. a kind of three-dimensional model search device based on deep learning according to claim 4, which is characterized in that the volume Volume module includes：

Submodule is selected, for selecting the region of particular size as pond region, using the convolution feature of Chi Huahou as view Primary features.