CN111597367B

CN111597367B - Three-dimensional model retrieval method based on view and hash algorithm

Info

Publication number: CN111597367B
Application number: CN202010418065.3A
Authority: CN
Inventors: 张满囤; 燕明晓; 王红; 田琪; 崔时雨; 齐畅; 魏玮; 吴清; 王小芳
Original assignee: Hebei University of Technology
Current assignee: Hebei University of Technology
Priority date: 2020-05-18
Filing date: 2020-05-18
Publication date: 2023-11-24
Anticipated expiration: 2040-05-18
Also published as: CN111597367A

Abstract

The invention relates to a three-dimensional model retrieval method based on view and hash algorithm, which comprises the steps of obtaining a plurality of view pictures shot by different three-dimensional models at different angles, and normalizing; constructing a convolutional neural network based on AlexNet: connecting two full-connection layers through a view layer after 5 convolutional layers, adding a hash layer after the last full-connection layer, converting high-dimensional characteristics into low-dimensional hash codes, and designing a quantization loss function in the conversion process to reduce quantization errors of the hash codes; training a convolutional neural network based on AlexNet by using an existing three-dimensional model data set, wherein the characteristics of each model are represented by hash characteristics learned by the trained network; and calculating the similarity between any given query three-dimensional model and the three-dimensional model in the three-dimensional model database by utilizing the Hamming distance, and selecting the first few models with the minimum Hamming distance as results to be output to a retrieval list, so that the retrieval efficiency of the three-dimensional model can be improved.

Description

Three-dimensional model retrieval method based on view and hash algorithm

Technical Field

The technical scheme of the invention relates to three-dimensional (3D) model retrieval, in particular to a three-dimensional model retrieval method based on view and hash algorithm.

Background

With the advent of the big data age, image acquisition became simpler and more versatile in acquisition modes. The advent of low cost 3D acquisition devices and 3D modeling tools in recent years has resulted in a rapid increase in the number of three-dimensional models, with very large three-dimensional model resources already on the network. The three-dimensional model is increasingly widely applied in three-dimensional games, virtual reality, industrial design, video entertainment and other aspects, and the requirements for accurate and efficient three-dimensional object retrieval are increasingly revealed.

The current three-dimensional model retrieval work can be mainly divided into two aspects: model-based retrieval and view-based retrieval. Model-based retrieval is primarily from the perspective of three-dimensional data to represent model features such as polygonal meshes, voxel grids, point clouds, or implicit surfaces. The model-based method can better retain the original data information and the space geometric characteristics of the three-dimensional model. However, it is sometimes difficult to directly represent models with three-dimensional data in the real world, and currently open-source three-dimensional feature model databases are relatively few. View-based retrieval is performed by representing three-dimensional models by a set of two-dimensional images, reducing the matching dimension between the three-dimensional models to a two-dimensional level, and querying the model to be searched by matching the similarity of the views, so that the problem of overfitting can be avoided to a large extent. However, in the existing view-based algorithm, the similarity retrieval is completed by measuring the extracted high-dimensional features in Euclidean space, and the retrieval efficiency is low. How to improve the model retrieval efficiency is a key to improve the three-dimensional model retrieval performance.

Disclosure of Invention

Aiming at the defect of low retrieval efficiency of the current three-dimensional model retrieval algorithm based on the view, the invention provides a three-dimensional model retrieval method based on the view and hash algorithm. According to the method, a hash algorithm is added to the last layer of the convolutional neural network, a model extracted by the convolutional layer is processed by a view layer, high-dimensional features are converted into hash code features through the hash layer, and similarity of the model is calculated in a low-dimensional Hamming space by utilizing Hamming distances, so that the model retrieval efficiency is improved.

The technical scheme adopted for solving the technical problems is as follows: the three-dimensional model retrieval method based on the view and hash algorithm comprises the steps of obtaining a plurality of view pictures shot by different angles of different three-dimensional models, and normalizing;

constructing a convolutional neural network based on AlexNet: connecting two full-connection layers through a view layer after 5 convolutional layers, adding a hash layer after the last full-connection layer, converting high-dimensional characteristics into low-dimensional hash codes, and designing a quantization loss function in the conversion process to reduce quantization errors of the hash codes;

training a convolutional neural network based on AlexNet by using an existing three-dimensional model data set, wherein the characteristics of each model are represented by hash characteristics learned by the trained network; and calculating the similarity between any given query three-dimensional model and the three-dimensional model in the three-dimensional model database by utilizing the Hamming distance, wherein the larger the Hamming distance is, the more dissimilar the model is, the smaller the Hamming distance is, the more similar the model is, the sorting is performed from small to large according to the Hamming distance, and a plurality of models with the forefront sorting are selected as results and output to a retrieval list.

In the above-mentioned search method, model scale standardization processing is performed on different three-dimensional models before obtaining multiple view pictures, and because of various models on the network and huge quantity, in order to avoid the influence of the size scale of the models in the search process, all models in the data set need to be standardized processing. And scaling the models into cubes with the side length of 2 by scaling the models, so that the uniformity and usability of the model characteristics can be ensured. The method comprises the following specific steps:

step 2-1, reading information of each point of the three-dimensional model, and finding out the coordinate point (x _min ,y _min ,z _min ) And the maximum coordinate point (x _max ,y _max ,z _max )。

Step 2-2, calculating the difference value between the maximum coordinate point and the minimum coordinate point, taking the maximum value of the difference values in three dimensions as the side length l of the model bounding box, constructing a cube bounding box, and placing the center of the model on the body center of the cube;

step 2-3, scaling the model to obtain a standardized model: the coordinates (x, y, z) of any point are scaled to obtain new coordinates (x ', y ', z '), and the specific calculation method is as follows:

x′＝(x-x _min )×2/l-1

y′＝(y-y _min )×2/l-1

z′＝(z-z _min )×2/l-1

after normalization processing, the coordinates of all points of the model are positioned in [ -1,1], and the model is positioned in a cube with the side length of 2, so as to obtain a normalized model.

In the above searching method, the multi-view picture obtaining process is: and (3) arranging the virtual camera array around the models, taking 12 view pictures from each model, normalizing the multiple view pictures into uniform size, and taking the normalized multiple view pictures as input of the convolutional neural network.

Step 3-1, placing a standardized model in the body center of the regular icosahedron, and placing virtual cameras at 12 vertexes of the regular icosahedron for shooting to obtain a group of 12 views with 256 multiplied by 256 of the model;

step 3-2, cutting the multi-view of the model into a size of 227×227, and using the multi-view as an input of a convolutional neural network, wherein the cutting method comprises the following steps:

left＝C _w /2-C′ _w /2

top＝C _h /2-C _h ′/2

right＝left+C′ _w

bottom＝top+C′ _h

wherein top, bottom, left, right respectively represent a new size (C' _w ,C′ _h ) In the original size (C) _w ,C _h ) Upper, lower, left and right boundaries of the clipping in (a).

The specific structure of the convolutional neural network based on AlexNet is as follows:

step 4-1, sequentially inputting 12 views with the size of 227×227 of all models into a convolutional neural network, and firstly acquiring local features of an image by using a convolutional pooling layer, wherein the specific settings of the convolutional layer and the pooling layer are as follows:

the first layer comprises a convolution layer and a maximum pooling layer, the convolution layer has a convolution kernel size of 11×11 and a step size of 4, and the activation function is set as a Relu function. And then carrying out pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step length is 2.

The second layer comprises a convolution layer and a maximum pooling layer, the convolution layer has a convolution kernel size of 5×5, a step size of 1, and an activation function is set as a Relu function. And then carrying out pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step length is 2.

The third layer comprises a convolution layer, the convolution kernel of the convolution layer has a size of 3×3, the step size is 1, and the activation function is set as a Relu function.

The fourth layer comprises a convolution layer, the convolution kernel of the convolution layer has a size of 3×3, the step size is 1, and the activation function is set as a Relu function.

The fifth layer comprises a convolution layer and a maximum pooling layer, the convolution layer has a convolution kernel size of 3×3, a step size of 1, and an activation function is set as a Relu function. And then carrying out pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step length is 2.

And step 4-2, after the convolution layer processing, adding a view layer after the fifth layer convolution layer, processing the characteristics of the 12 pictures of each model after the convolution processing by the view layer, comparing the 12 pictures by the view layer to obtain the characteristic maximum value of each dimension of each picture, generating the characteristic descriptor of the three-dimensional model, and inputting the characteristic descriptor into the full-connection layer for processing. The total connection layer is provided with 2 layers which are the same, 4096 neurons are arranged, a Relu activation function is added to avoid gradient disappearance, a dropout layer is added to randomly set the value of the neurons to 0, network parameters are reduced, complexity is reduced, and overfitting is prevented.

Step 4-3, adding a hash layer after the full connection layer, wherein the hash layer contains k hidden layer neurons (namely the bit number of the hash code), and setting a sigmod activation function. Mapping 4096-dimensional features of full connection layer output to low-dimensional space to form low-dimensional hash features f _n Further converting it into discrete hash code b _n The conversion process is b _n ＝sgn(f _n -0.5). At the same time design quantization loss function L _ql To control the error of the hash quantization process.N is the number of samples entered and k is the number of hash code bits.

When a network is trained, a public Prlington three-dimensional model data set ModelNet40 is used, training set data are input into a convolutional neural network based on AlexNet for training after model scale standardization processing and multi-view picture normalization processing, and network parameters are optimized to generate a network model; and then testing the model test set by using the generated network model. The invention uses Tensorflow deep learning framework, the language is Python3.6.

The hamming distance is calculated by:

and obtaining hash code characteristics corresponding to the characteristics of each model, wherein the similarity between the models is represented by a Hamming distance D, the larger the Hamming distance is, the less the model is, and the smaller the Hamming distance is, the more the model is similar. The Hamming distance calculating method is thatb _i ,b _j For the hash characteristics of the two models, +.>Is an exclusive-or operation; for any query three-dimensional model Q, similarity measurement is carried out between the query three-dimensional model Q and the three-dimensional model in the three-dimensional model database M, and the model Q is matched ^* The calculation process of (1) is as follows:

S(Q,M)＝argminD(b _i ,b _j )

s represents similarity between models, M _m Represents the m-th model (m is more than or equal to 1 and less than or equal to N) in the database ^* )，N ^* Is the number of samples in the database; finally, the 10 models with the highest similarity with the models are output to the search as results through the calculationA list.

Compared with the prior art, the invention has the beneficial effects that:

1. aiming at the three-dimensional model retrieval efficiency task, an algorithm based on view and hash learning is provided. The method combines the advantages of convolutional neural network, multi-view and hash algorithm search, and obtains better results in three-dimensional model search. The convolution network design of the invention utilizes the front convolution layer to process a plurality of views to generate a view pool (view layer), combines the multiple views of the three-dimensional model together, inputs the multiple views into the rear network to extract the characteristics, adds a hash layer after the processing of the full connection layer, and the hash layer is the last layer, the high-dimensional characteristics are learned to the hash characteristics through a hash algorithm, the loss error of hash quantization is controlled, and almost lossless hash codes are generated, so that the three-dimensional retrieval precision and efficiency can be improved.

2. The retrieval method of the invention performs scale standardization processing on the initially obtained three-dimensional model data, so that the method can be suitable for various models on a data set or a network, can avoid the problem of influencing the characteristics extracted by the models due to overlarge scale difference, and in the embodiment, 12 view pictures of the three-dimensional model are selected to be adopted for characteristic extraction, and a large amount of redundancy can not be caused on the premise of improving the detection precision and efficiency. In order to test the performance of the algorithm, the ModelNet40 data set is compared with the existing algorithm, and the result shows that the method has good performance.

3. The method of the invention introduces the Hash layer and then adds the specific quantization loss function to control the quantization error in the Hash code conversion process, thereby improving the retrieval efficiency, and the low-dimensional Hash characteristic enables the Hamming distance to be utilized for quick retrieval, thereby ensuring the retrieval efficiency.

Drawings

Fig. 1 is a general flow chart of the present invention.

FIG. 2 is a normalized processing result of an example three-dimensional model of the present invention.

Fig. 3 is a two-dimensional projection process of a three-dimensional model of the present invention.

FIG. 4 is a set of two-dimensional views obtained from projection of an example model in accordance with the present invention.

Fig. 5 is a network hierarchy diagram of the present invention.

Fig. 6 is a ROC graph comparing the performance of the present invention with other advanced algorithms on a model net40 dataset. The correspondence of the other 5 algorithms in fig. 6 is shown below.

[1]Su H,Maji S,Kalogerakis E,et al.Multi-view convolutional neural networks for 3D shape recognition//IEEE International Conference on Computer Vision,Santiago,2015:945-953.

[2]Wu N Z,Song S,Khosla A,et al.3D shapenets:a deep representation for volumetric shape modeling//2015IEEE Conference on Computer Vision and Pattern Recognition(CVPR),Boston,MA,2015:1912-1920.

[3]Cheng HC,Lo C H,Chu CH,Kim YS.Shape similarity measurement for 3D mechanical part using D2shape distribution and negative feature decomposition.Computers in Industry,2010,62(3):269-280.

[4]Kun Zhou,Minmin Gong,Xin Huang,Baining Guo.Data-parallel octrees for surface reconstruction.IEEE transactions on visualization and computer graphics,2011,17(5):669–681.

Detailed Description

The invention will be further described with reference to the accompanying drawings, to which the scope of protection of the invention is not limited.

As shown in fig. 1, the three-dimensional model retrieval method based on the view and hash algorithm mainly comprises 7 modules: inputting a three-dimensional model; model standardization; acquiring a two-dimensional view of the model; designing a convolutional neural network structure; training a convolutional neural network structure; generating model features; and (5) retrieving model similarity.

1. Input model module

The invention uses ModelNet40 data set published by Proston university to make experiment, the data set includes general model class 40, the model of each class is divided into training set and testing set, the invention uses 9461 models of training set to train.

2. Model normalization

The number is enormous due to the wide variety of models on the network. In order to avoid being influenced by the size scale of the model in the retrieval process, the scale standardization process needs to be carried out on all models in the data set. For the aircraft model in fig. 2, the implementation steps of model normalization are:

step 2-1, reading information of each point of the aircraft model, and finding a coordinate point (x _min ,y _min ,z _min ) And the maximum coordinate point (x _max ,y _max ,z _max )。

Step 2-2 calculation (x _max -x _min ),(y _max -y _min ),(z _max -z _min ) Taking the maximum value of the three as the side length l of the model bounding box, constructing a cube bounding box, and placing the center of the model on the cube center of the cube.

And 2-3, scaling the model to obtain a standardized model. The coordinates (x, y, z) of any point are scaled to obtain new coordinates (x ', y ', z '), and the specific calculation method is as follows:

x′＝(x-x _min )×2/l-1

y′＝(y-y _min )×2/l-1

z′＝(z-z _min )×2/l-1

after normalization, the coordinates of all points of the model are located at [ -1,1]. All point coordinates are located at [ -1,1] after the model is normalized as shown in fig. 2, the model is in a cube of side length 2.

3. Acquiring a two-dimensional view of a model

Step 3-1 as shown in fig. 3, the model is placed in the body center of the regular icosahedron, and virtual cameras are placed at the 12 vertices of the regular icosahedron to take a photograph, and a set of 12 views of the model is obtained. Fig. 4 shows a view of 12 views of 256×256 size taken from an example aircraft model.

Step 3-2 cuts the multiple views of the model to a size of 227 x 227 as input to the convolutional neural network. The shearing method comprises the following steps:

left＝C _w /2-C′ _w /2

top＝C _h /2-C′ _h /2

right＝left+C′ _w

bottom＝top+C′ _h

wherein C is _w ＝C _h ＝256,C′ _w ＝C′ _h Calculated is left=15, right=242, top=15, bottom=242.

4. Design convolutional neural network structure

Step 4-1, inputting the cut model multi-view into a convolutional neural network, wherein the network structure is shown in fig. 5, firstly, local features of an image are acquired by using a convolutional pooling layer, and the convolutional pooling layer is specifically set as follows:

Step 4-3, adding a hash layer after the full connection layer, wherein the hash layer contains k hidden layer neurons (namely the bit number of the hash code), and setting a sigmod activation function. Mapping 4096-dimensional features of full connection layer output to low-dimensional space to form low-dimensional hash features f _n Further converting it into discrete hash code b _n The conversion process is b _n ＝sgn(f _n -0.5). At the same time design quantization loss function L _ql To control the error of the hash quantization process.N is the number of samples entered and k is the number of hash code bits. We set N to 9461 and k to 48 at the time of the experiment.

5. Training convolutional neural network architecture

The invention uses a TensorFlow deep learning framework, the language is Python3.6. Training was performed using a training set in the ModelNet40 dataset for a total of 9461 models, with batch_size set to 16 and learning rate set to 0.0001.

6. Generating model features

Through training of the training set, a network model capable of well learning model hash features is generated, the hash features of the model are output by the last hash layer, each model has 48-bit hash features, and the hash features of the airplane model in fig. 2 are [011101111001100110110111101110110110001100111010].

7. Model similarity retrieval

The features of each model are represented by a hash code learned by the trained web through the fourth step. Ha Xiceng maps the high-dimensional features of the model to hash code features in the low-dimensional hamming space. The similarity between the models is thus represented by the hamming distance D, the hamming distanceThe larger the representation model is the less similar, the smaller the hamming distance the more similar the representation model is. The Hamming distance calculating method is thatb _i ,b _j For the hash characteristics of the two models, +.>Is an exclusive or operation. For any query three-dimensional model Q, similarity measurement is carried out between the query three-dimensional model Q and the three-dimensional model in the three-dimensional model database M, and the model Q is matched ^* The calculation process of (1) is as follows:

S(Q,M)＝argminD(b _i ,b _j )

s represents similarity between models, M _m Represents the m-th model (m is more than or equal to 1 and less than or equal to N) in the database ^* )，N ^* Is the number of samples in the database M. Finally, a matching model Q is obtained ^* . And finally outputting the 10 models with the highest similarity with the models to a retrieval list as results through the calculation. The aircraft model airland_0219. Off as in FIG. 2 is searched to return the 10 nearest models of [ 'airland_0219. Off', 'airland_0115. Off', 'airland_0218. Off', 'airland_0002. Off', 'airland_0027. Off', 'airland_0566. Off', 'airland_0020. Off', 'airland_0374. Off', 'airland_0613. Off', and 'airland_0276. Off'.]。

To verify the effectiveness of the present invention, a comparison was made with the other 5 advanced algorithms on the disclosed three-dimensional model dataset ModelNet40, and FIG. 6 shows the subject's working characteristics curves (Receiver OperatingCharacteristic Curve, ROC curve for short) for each algorithm, with the ROC curve on the ordinate being true positive (TPR sensitivity) and on the abscissa being false positive (FPR specificity). The abscissa is the false positive rate (False positive rate, FPR), the proportion of samples predicted to be positive but actually negative to all negative samples; on the ordinate, true positive rate (True positive rate, TPR), the proportion of samples predicted to be positive and actually positive to all positive samples. The closer the point on the curve is to the upper left the higher the true rate, the lower the false positive rate, the stronger the algorithm discrimination and the better the performance. From the results in the figure, the three-dimensional model retrieval method based on the view and the hash algorithm has excellent performance.

In the above embodiments, the AlexNet convolutional neural network, the modenet 40 dataset, the TensorFlow deep learning framework, the Relu activation function, the dropout layer, and the sigmod activation function are well known in the art.

The foregoing is a detailed description of embodiments of the invention, when read in conjunction with the accompanying drawings, with the description of the embodiments set forth above being only intended to facilitate a better understanding of the method of the invention. It will be understood by those skilled in the art that various modifications and equivalents may be made thereto without departing from the spirit and scope of the invention as defined by the appended claims.

The invention is applicable to the prior art where it is not described.

Claims

1. A three-dimensional model retrieval method based on view and hash algorithm comprises the steps of obtaining a plurality of view pictures shot by different angles of different three-dimensional models, and normalizing;

training a convolutional neural network based on AlexNet by using an existing three-dimensional model data set, wherein the characteristics of each model are represented by hash characteristics learned by the trained network; calculating the similarity between any given query three-dimensional model and the three-dimensional model in the three-dimensional model database by utilizing the Hamming distance, wherein the larger the Hamming distance is, the more dissimilar the model is represented, the smaller the Hamming distance is, the more similar the model is represented, the sorting is carried out from small to large according to the Hamming distance, and a plurality of models with the forefront sorting are selected as results and output to a retrieval list;

the view layer is that the feature maximum value of each dimension of each picture is selected from a plurality of pictures of the same three-dimensional model after the feature extraction of 5 layers of convolution layers, and a feature descriptor of the three-dimensional model is generated and input into a full-connection layer for processing;

the high-dimensional characteristics output by the full connection layer are transcoded into low-dimensional hash characteristics f through the hash layer _n Then hash feature f _n According to b _n ＝sgn(f _n -0.5) into a discrete hash code b _n The method comprises the steps of carrying out a first treatment on the surface of the Quantization loss function L in conversion process _ql Is thats.t.b _n ∈{0,1} ^k N is the number of samples entered and k is the number of hash code bits.

2. The retrieval method according to claim 1, wherein model scale standardization processing is performed on different three-dimensional models before a plurality of view pictures are acquired, and the scaling processing is performed on the models to scale the models with different scales into cubes with side lengths of 2, and the specific steps are as follows:

1) Reading information of each point of the three-dimensional model, and finding a coordinate point (x _min ,y _min ,z _min ) And the maximum coordinate point (x _max ,y _max ,z _max )；

2) Calculating the difference value between the maximum coordinate point and the minimum coordinate point, taking the maximum value of the difference values in three dimensions as the side length l of the model bounding box, constructing a cube bounding box, and placing the center of the model on the cube center of the cube;

3) Scaling the model to obtain a standardized model: the coordinates (x, y, z) of any point are scaled to obtain new coordinates (x ', y ', z '), and the specific calculation method is as follows:

x′＝(x-x _min )×2/l-1

y′＝(y-y _min )×2/l-1

z′＝(z-z _min )×2/l-1

3. The retrieval method according to claim 2, wherein the multi-view picture acquisition process is: the virtual camera array is arranged around the models, each model shoots 12 view pictures, and the normalized multiple view pictures are processed into uniform size and then used as input of the convolutional neural network; the method comprises the following specific steps:

1) Placing the standardized model in the body center of the regular icosahedron, and placing virtual cameras at 12 vertexes of the regular icosahedron for shooting to obtain a group of 12 views with 256 multiplied by 256 of the model;

2) The multi-view of the model is cut into the size of 227×227, and the cutting method is as the input of the convolutional neural network:

left＝C _w /2-C′ _w /2

top＝C _h /2-C′ _h /2

right＝left+C′ _w

bottom＝top+C′ _h

4. The retrieval method according to claim 3, wherein the specific structure of the AlexNet-based convolutional neural network is:

1) Sequentially inputting 12 views with the size of 227×227 of all models into a convolutional neural network, and firstly, acquiring local features of an image by using a convolutional pooling layer, wherein the specific settings of the convolutional layer and the pooling layer are as follows:

the first layer comprises a convolution layer and a maximum pooling layer, the convolution layer has a convolution kernel size of 11 multiplied by 11, the step length is 4, and an activation function is set as a Relu function; then carrying out pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step length is 2;

the second layer comprises a convolution layer and a maximum pooling layer, the convolution layer has a convolution kernel size of 5 multiplied by 5, the step length is 1, and an activation function is set as a Relu function; then carrying out pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step length is 2;

the third layer comprises a convolution layer, the convolution kernel of the convolution layer has the size of 3 multiplied by 3, the step length is 1, and the activation function is set as a Relu function;

the fourth layer comprises a convolution layer, the convolution kernel of the convolution layer has the size of 3 multiplied by 3, the step length is 1, and the activation function is set as a Relu function;

the fifth layer comprises a convolution layer and a maximum pooling layer, the convolution layer has a convolution kernel size of 3 multiplied by 3, the step length is 1, and an activation function is set as a Relu function; then carrying out pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step length is 2;

2) After the convolution layer treatment, adding a view layer after the fifth layer convolution layer, carrying out view layer treatment on the characteristics of the 12 pictures of each three-dimensional model after the convolution treatment, comparing the 12 pictures by the view layer to obtain the characteristic maximum value of each dimension of each picture, generating a characteristic descriptor of the three-dimensional model, and inputting the characteristic descriptor into a full-connection layer for treatment; setting 4096 neurons in 2 layers of the full-connection layer, adding a Relu activation function to avoid gradient disappearance, and adding a dropout layer to randomly set the value of the neurons to 0;

3) Adding a hash layer after the full connection layer, wherein the hash layer contains k hidden layer neurons, namely the bit number of the hash code, and setting a sigmod activation function; mapping 4096-dimensional features of full connection layer output to low-dimensional space to form low-dimensional hash features f _n Further converting it into discrete hash code b _n The conversion process is b _n ＝sgn(f _n -0.5); at the same time design quantization loss function L _ql Is that

Wherein b _n ∈{0,1} ^k N is the number of samples entered.

5. The retrieval method according to claim 1, wherein when training a network, the disclosed princeton three-dimensional model data set ModelNet40 is used, the training set data is input into a convolutional neural network based on AlexNet for training after model scale standardization processing and multi-view picture normalization processing, and network parameters are optimized to generate a network model; and then testing the model test set by using the generated network model.

6. The retrieval method according to claim 1, wherein the hamming distance calculation process is:

the Hash code characteristics corresponding to the characteristics of each model are obtained, and the Hamming distance calculation method is as followsb _i ,b _j For the hash characteristics of the two models, +.>Is an exclusive-or operation; for any query three-dimensional model Q, similarity measurement is carried out between the query three-dimensional model Q and the three-dimensional model in the three-dimensional model database M, and the model Q is matched ^* The calculation process of (1) is as follows:

S(Q,M)＝arg min D(b _i ,b _j )

s represents similarity between models, M _m Represents the m-th model (m is more than or equal to 1 and less than or equal to N) in the database ^* )，N ^* Is the number of samples in the database; and finally outputting the 10 models with the highest similarity with the models to a retrieval list as results through the calculation.