CN111597367B - Three-dimensional model retrieval method based on view and hash algorithm - Google Patents

Three-dimensional model retrieval method based on view and hash algorithm Download PDF

Info

Publication number
CN111597367B
CN111597367B CN202010418065.3A CN202010418065A CN111597367B CN 111597367 B CN111597367 B CN 111597367B CN 202010418065 A CN202010418065 A CN 202010418065A CN 111597367 B CN111597367 B CN 111597367B
Authority
CN
China
Prior art keywords
layer
model
dimensional
hash
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010418065.3A
Other languages
Chinese (zh)
Other versions
CN111597367A (en
Inventor
张满囤
燕明晓
王红
田琪
崔时雨
齐畅
魏玮
吴清
王小芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei University of Technology
Original Assignee
Hebei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei University of Technology filed Critical Hebei University of Technology
Priority to CN202010418065.3A priority Critical patent/CN111597367B/en
Publication of CN111597367A publication Critical patent/CN111597367A/en
Application granted granted Critical
Publication of CN111597367B publication Critical patent/CN111597367B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to a three-dimensional model retrieval method based on view and hash algorithm, which comprises the steps of obtaining a plurality of view pictures shot by different three-dimensional models at different angles, and normalizing; constructing a convolutional neural network based on AlexNet: connecting two full-connection layers through a view layer after 5 convolutional layers, adding a hash layer after the last full-connection layer, converting high-dimensional characteristics into low-dimensional hash codes, and designing a quantization loss function in the conversion process to reduce quantization errors of the hash codes; training a convolutional neural network based on AlexNet by using an existing three-dimensional model data set, wherein the characteristics of each model are represented by hash characteristics learned by the trained network; and calculating the similarity between any given query three-dimensional model and the three-dimensional model in the three-dimensional model database by utilizing the Hamming distance, and selecting the first few models with the minimum Hamming distance as results to be output to a retrieval list, so that the retrieval efficiency of the three-dimensional model can be improved.

Description

Three-dimensional model retrieval method based on view and hash algorithm
Technical Field
The technical scheme of the invention relates to three-dimensional (3D) model retrieval, in particular to a three-dimensional model retrieval method based on view and hash algorithm.
Background
With the advent of the big data age, image acquisition became simpler and more versatile in acquisition modes. The advent of low cost 3D acquisition devices and 3D modeling tools in recent years has resulted in a rapid increase in the number of three-dimensional models, with very large three-dimensional model resources already on the network. The three-dimensional model is increasingly widely applied in three-dimensional games, virtual reality, industrial design, video entertainment and other aspects, and the requirements for accurate and efficient three-dimensional object retrieval are increasingly revealed.
The current three-dimensional model retrieval work can be mainly divided into two aspects: model-based retrieval and view-based retrieval. Model-based retrieval is primarily from the perspective of three-dimensional data to represent model features such as polygonal meshes, voxel grids, point clouds, or implicit surfaces. The model-based method can better retain the original data information and the space geometric characteristics of the three-dimensional model. However, it is sometimes difficult to directly represent models with three-dimensional data in the real world, and currently open-source three-dimensional feature model databases are relatively few. View-based retrieval is performed by representing three-dimensional models by a set of two-dimensional images, reducing the matching dimension between the three-dimensional models to a two-dimensional level, and querying the model to be searched by matching the similarity of the views, so that the problem of overfitting can be avoided to a large extent. However, in the existing view-based algorithm, the similarity retrieval is completed by measuring the extracted high-dimensional features in Euclidean space, and the retrieval efficiency is low. How to improve the model retrieval efficiency is a key to improve the three-dimensional model retrieval performance.
Disclosure of Invention
Aiming at the defect of low retrieval efficiency of the current three-dimensional model retrieval algorithm based on the view, the invention provides a three-dimensional model retrieval method based on the view and hash algorithm. According to the method, a hash algorithm is added to the last layer of the convolutional neural network, a model extracted by the convolutional layer is processed by a view layer, high-dimensional features are converted into hash code features through the hash layer, and similarity of the model is calculated in a low-dimensional Hamming space by utilizing Hamming distances, so that the model retrieval efficiency is improved.
The technical scheme adopted for solving the technical problems is as follows: the three-dimensional model retrieval method based on the view and hash algorithm comprises the steps of obtaining a plurality of view pictures shot by different angles of different three-dimensional models, and normalizing;
constructing a convolutional neural network based on AlexNet: connecting two full-connection layers through a view layer after 5 convolutional layers, adding a hash layer after the last full-connection layer, converting high-dimensional characteristics into low-dimensional hash codes, and designing a quantization loss function in the conversion process to reduce quantization errors of the hash codes;
training a convolutional neural network based on AlexNet by using an existing three-dimensional model data set, wherein the characteristics of each model are represented by hash characteristics learned by the trained network; and calculating the similarity between any given query three-dimensional model and the three-dimensional model in the three-dimensional model database by utilizing the Hamming distance, wherein the larger the Hamming distance is, the more dissimilar the model is, the smaller the Hamming distance is, the more similar the model is, the sorting is performed from small to large according to the Hamming distance, and a plurality of models with the forefront sorting are selected as results and output to a retrieval list.
In the above-mentioned search method, model scale standardization processing is performed on different three-dimensional models before obtaining multiple view pictures, and because of various models on the network and huge quantity, in order to avoid the influence of the size scale of the models in the search process, all models in the data set need to be standardized processing. And scaling the models into cubes with the side length of 2 by scaling the models, so that the uniformity and usability of the model characteristics can be ensured. The method comprises the following specific steps:
step 2-1, reading information of each point of the three-dimensional model, and finding out the coordinate point (x min ,y min ,z min ) And the maximum coordinate point (x max ,y max ,z max )。
Step 2-2, calculating the difference value between the maximum coordinate point and the minimum coordinate point, taking the maximum value of the difference values in three dimensions as the side length l of the model bounding box, constructing a cube bounding box, and placing the center of the model on the body center of the cube;
step 2-3, scaling the model to obtain a standardized model: the coordinates (x, y, z) of any point are scaled to obtain new coordinates (x ', y ', z '), and the specific calculation method is as follows:
x′=(x-x min )×2/l-1
y′=(y-y min )×2/l-1
z′=(z-z min )×2/l-1
after normalization processing, the coordinates of all points of the model are positioned in [ -1,1], and the model is positioned in a cube with the side length of 2, so as to obtain a normalized model.
In the above searching method, the multi-view picture obtaining process is: and (3) arranging the virtual camera array around the models, taking 12 view pictures from each model, normalizing the multiple view pictures into uniform size, and taking the normalized multiple view pictures as input of the convolutional neural network.
Step 3-1, placing a standardized model in the body center of the regular icosahedron, and placing virtual cameras at 12 vertexes of the regular icosahedron for shooting to obtain a group of 12 views with 256 multiplied by 256 of the model;
step 3-2, cutting the multi-view of the model into a size of 227×227, and using the multi-view as an input of a convolutional neural network, wherein the cutting method comprises the following steps:
left=C w /2-C′ w /2
top=C h /2-C h ′/2
right=left+C′ w
bottom=top+C′ h
wherein top, bottom, left, right respectively represent a new size (C' w ,C′ h ) In the original size (C) w ,C h ) Upper, lower, left and right boundaries of the clipping in (a).
The specific structure of the convolutional neural network based on AlexNet is as follows:
step 4-1, sequentially inputting 12 views with the size of 227×227 of all models into a convolutional neural network, and firstly acquiring local features of an image by using a convolutional pooling layer, wherein the specific settings of the convolutional layer and the pooling layer are as follows:
the first layer comprises a convolution layer and a maximum pooling layer, the convolution layer has a convolution kernel size of 11×11 and a step size of 4, and the activation function is set as a Relu function. And then carrying out pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step length is 2.
The second layer comprises a convolution layer and a maximum pooling layer, the convolution layer has a convolution kernel size of 5×5, a step size of 1, and an activation function is set as a Relu function. And then carrying out pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step length is 2.
The third layer comprises a convolution layer, the convolution kernel of the convolution layer has a size of 3×3, the step size is 1, and the activation function is set as a Relu function.
The fourth layer comprises a convolution layer, the convolution kernel of the convolution layer has a size of 3×3, the step size is 1, and the activation function is set as a Relu function.
The fifth layer comprises a convolution layer and a maximum pooling layer, the convolution layer has a convolution kernel size of 3×3, a step size of 1, and an activation function is set as a Relu function. And then carrying out pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step length is 2.
And step 4-2, after the convolution layer processing, adding a view layer after the fifth layer convolution layer, processing the characteristics of the 12 pictures of each model after the convolution processing by the view layer, comparing the 12 pictures by the view layer to obtain the characteristic maximum value of each dimension of each picture, generating the characteristic descriptor of the three-dimensional model, and inputting the characteristic descriptor into the full-connection layer for processing. The total connection layer is provided with 2 layers which are the same, 4096 neurons are arranged, a Relu activation function is added to avoid gradient disappearance, a dropout layer is added to randomly set the value of the neurons to 0, network parameters are reduced, complexity is reduced, and overfitting is prevented.
Step 4-3, adding a hash layer after the full connection layer, wherein the hash layer contains k hidden layer neurons (namely the bit number of the hash code), and setting a sigmod activation function. Mapping 4096-dimensional features of full connection layer output to low-dimensional space to form low-dimensional hash features f n Further converting it into discrete hash code b n The conversion process is b n =sgn(f n -0.5). At the same time design quantization loss function L ql To control the error of the hash quantization process.N is the number of samples entered and k is the number of hash code bits.
When a network is trained, a public Prlington three-dimensional model data set ModelNet40 is used, training set data are input into a convolutional neural network based on AlexNet for training after model scale standardization processing and multi-view picture normalization processing, and network parameters are optimized to generate a network model; and then testing the model test set by using the generated network model. The invention uses Tensorflow deep learning framework, the language is Python3.6.
The hamming distance is calculated by:
and obtaining hash code characteristics corresponding to the characteristics of each model, wherein the similarity between the models is represented by a Hamming distance D, the larger the Hamming distance is, the less the model is, and the smaller the Hamming distance is, the more the model is similar. The Hamming distance calculating method is thatb i ,b j For the hash characteristics of the two models, +.>Is an exclusive-or operation; for any query three-dimensional model Q, similarity measurement is carried out between the query three-dimensional model Q and the three-dimensional model in the three-dimensional model database M, and the model Q is matched * The calculation process of (1) is as follows:
S(Q,M)=argminD(b i ,b j )
s represents similarity between models, M m Represents the m-th model (m is more than or equal to 1 and less than or equal to N) in the database * ),N * Is the number of samples in the database; finally, the 10 models with the highest similarity with the models are output to the search as results through the calculationA list.
Compared with the prior art, the invention has the beneficial effects that:
1. aiming at the three-dimensional model retrieval efficiency task, an algorithm based on view and hash learning is provided. The method combines the advantages of convolutional neural network, multi-view and hash algorithm search, and obtains better results in three-dimensional model search. The convolution network design of the invention utilizes the front convolution layer to process a plurality of views to generate a view pool (view layer), combines the multiple views of the three-dimensional model together, inputs the multiple views into the rear network to extract the characteristics, adds a hash layer after the processing of the full connection layer, and the hash layer is the last layer, the high-dimensional characteristics are learned to the hash characteristics through a hash algorithm, the loss error of hash quantization is controlled, and almost lossless hash codes are generated, so that the three-dimensional retrieval precision and efficiency can be improved.
2. The retrieval method of the invention performs scale standardization processing on the initially obtained three-dimensional model data, so that the method can be suitable for various models on a data set or a network, can avoid the problem of influencing the characteristics extracted by the models due to overlarge scale difference, and in the embodiment, 12 view pictures of the three-dimensional model are selected to be adopted for characteristic extraction, and a large amount of redundancy can not be caused on the premise of improving the detection precision and efficiency. In order to test the performance of the algorithm, the ModelNet40 data set is compared with the existing algorithm, and the result shows that the method has good performance.
3. The method of the invention introduces the Hash layer and then adds the specific quantization loss function to control the quantization error in the Hash code conversion process, thereby improving the retrieval efficiency, and the low-dimensional Hash characteristic enables the Hamming distance to be utilized for quick retrieval, thereby ensuring the retrieval efficiency.
Drawings
Fig. 1 is a general flow chart of the present invention.
FIG. 2 is a normalized processing result of an example three-dimensional model of the present invention.
Fig. 3 is a two-dimensional projection process of a three-dimensional model of the present invention.
FIG. 4 is a set of two-dimensional views obtained from projection of an example model in accordance with the present invention.
Fig. 5 is a network hierarchy diagram of the present invention.
Fig. 6 is a ROC graph comparing the performance of the present invention with other advanced algorithms on a model net40 dataset. The correspondence of the other 5 algorithms in fig. 6 is shown below.
[1]Su H,Maji S,Kalogerakis E,et al.Multi-view convolutional neural networks for 3D shape recognition//IEEE International Conference on Computer Vision,Santiago,2015:945-953.
[2]Wu N Z,Song S,Khosla A,et al.3D shapenets:a deep representation for volumetric shape modeling//2015IEEE Conference on Computer Vision and Pattern Recognition(CVPR),Boston,MA,2015:1912-1920.
[3]Cheng HC,Lo C H,Chu CH,Kim YS.Shape similarity measurement for 3D mechanical part using D2shape distribution and negative feature decomposition.Computers in Industry,2010,62(3):269-280.
[4]Kun Zhou,Minmin Gong,Xin Huang,Baining Guo.Data-parallel octrees for surface reconstruction.IEEE transactions on visualization and computer graphics,2011,17(5):669–681.
Detailed Description
The invention will be further described with reference to the accompanying drawings, to which the scope of protection of the invention is not limited.
As shown in fig. 1, the three-dimensional model retrieval method based on the view and hash algorithm mainly comprises 7 modules: inputting a three-dimensional model; model standardization; acquiring a two-dimensional view of the model; designing a convolutional neural network structure; training a convolutional neural network structure; generating model features; and (5) retrieving model similarity.
1. Input model module
The invention uses ModelNet40 data set published by Proston university to make experiment, the data set includes general model class 40, the model of each class is divided into training set and testing set, the invention uses 9461 models of training set to train.
2. Model normalization
The number is enormous due to the wide variety of models on the network. In order to avoid being influenced by the size scale of the model in the retrieval process, the scale standardization process needs to be carried out on all models in the data set. For the aircraft model in fig. 2, the implementation steps of model normalization are:
step 2-1, reading information of each point of the aircraft model, and finding a coordinate point (x min ,y min ,z min ) And the maximum coordinate point (x max ,y max ,z max )。
Step 2-2 calculation (x max -x min ),(y max -y min ),(z max -z min ) Taking the maximum value of the three as the side length l of the model bounding box, constructing a cube bounding box, and placing the center of the model on the cube center of the cube.
And 2-3, scaling the model to obtain a standardized model. The coordinates (x, y, z) of any point are scaled to obtain new coordinates (x ', y ', z '), and the specific calculation method is as follows:
x′=(x-x min )×2/l-1
y′=(y-y min )×2/l-1
z′=(z-z min )×2/l-1
after normalization, the coordinates of all points of the model are located at [ -1,1]. All point coordinates are located at [ -1,1] after the model is normalized as shown in fig. 2, the model is in a cube of side length 2.
3. Acquiring a two-dimensional view of a model
Step 3-1 as shown in fig. 3, the model is placed in the body center of the regular icosahedron, and virtual cameras are placed at the 12 vertices of the regular icosahedron to take a photograph, and a set of 12 views of the model is obtained. Fig. 4 shows a view of 12 views of 256×256 size taken from an example aircraft model.
Step 3-2 cuts the multiple views of the model to a size of 227 x 227 as input to the convolutional neural network. The shearing method comprises the following steps:
left=C w /2-C′ w /2
top=C h /2-C′ h /2
right=left+C′ w
bottom=top+C′ h
wherein C is w =C h =256,C′ w =C′ h Calculated is left=15, right=242, top=15, bottom=242.
4. Design convolutional neural network structure
Step 4-1, inputting the cut model multi-view into a convolutional neural network, wherein the network structure is shown in fig. 5, firstly, local features of an image are acquired by using a convolutional pooling layer, and the convolutional pooling layer is specifically set as follows:
the first layer comprises a convolution layer and a maximum pooling layer, the convolution layer has a convolution kernel size of 11×11 and a step size of 4, and the activation function is set as a Relu function. And then carrying out pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step length is 2.
The second layer comprises a convolution layer and a maximum pooling layer, the convolution layer has a convolution kernel size of 5×5, a step size of 1, and an activation function is set as a Relu function. And then carrying out pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step length is 2.
The third layer comprises a convolution layer, the convolution kernel of the convolution layer has a size of 3×3, the step size is 1, and the activation function is set as a Relu function.
The fourth layer comprises a convolution layer, the convolution kernel of the convolution layer has a size of 3×3, the step size is 1, and the activation function is set as a Relu function.
The fifth layer comprises a convolution layer and a maximum pooling layer, the convolution layer has a convolution kernel size of 3×3, a step size of 1, and an activation function is set as a Relu function. And then carrying out pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step length is 2.
And step 4-2, after the convolution layer processing, adding a view layer after the fifth layer convolution layer, processing the characteristics of the 12 pictures of each model after the convolution processing by the view layer, comparing the 12 pictures by the view layer to obtain the characteristic maximum value of each dimension of each picture, generating the characteristic descriptor of the three-dimensional model, and inputting the characteristic descriptor into the full-connection layer for processing. The total connection layer is provided with 2 layers which are the same, 4096 neurons are arranged, a Relu activation function is added to avoid gradient disappearance, a dropout layer is added to randomly set the value of the neurons to 0, network parameters are reduced, complexity is reduced, and overfitting is prevented.
Step 4-3, adding a hash layer after the full connection layer, wherein the hash layer contains k hidden layer neurons (namely the bit number of the hash code), and setting a sigmod activation function. Mapping 4096-dimensional features of full connection layer output to low-dimensional space to form low-dimensional hash features f n Further converting it into discrete hash code b n The conversion process is b n =sgn(f n -0.5). At the same time design quantization loss function L ql To control the error of the hash quantization process.N is the number of samples entered and k is the number of hash code bits. We set N to 9461 and k to 48 at the time of the experiment.
5. Training convolutional neural network architecture
The invention uses a TensorFlow deep learning framework, the language is Python3.6. Training was performed using a training set in the ModelNet40 dataset for a total of 9461 models, with batch_size set to 16 and learning rate set to 0.0001.
6. Generating model features
Through training of the training set, a network model capable of well learning model hash features is generated, the hash features of the model are output by the last hash layer, each model has 48-bit hash features, and the hash features of the airplane model in fig. 2 are [011101111001100110110111101110110110001100111010].
7. Model similarity retrieval
The features of each model are represented by a hash code learned by the trained web through the fourth step. Ha Xiceng maps the high-dimensional features of the model to hash code features in the low-dimensional hamming space. The similarity between the models is thus represented by the hamming distance D, the hamming distanceThe larger the representation model is the less similar, the smaller the hamming distance the more similar the representation model is. The Hamming distance calculating method is thatb i ,b j For the hash characteristics of the two models, +.>Is an exclusive or operation. For any query three-dimensional model Q, similarity measurement is carried out between the query three-dimensional model Q and the three-dimensional model in the three-dimensional model database M, and the model Q is matched * The calculation process of (1) is as follows:
S(Q,M)=argminD(b i ,b j )
s represents similarity between models, M m Represents the m-th model (m is more than or equal to 1 and less than or equal to N) in the database * ),N * Is the number of samples in the database M. Finally, a matching model Q is obtained * . And finally outputting the 10 models with the highest similarity with the models to a retrieval list as results through the calculation. The aircraft model airland_0219. Off as in FIG. 2 is searched to return the 10 nearest models of [ 'airland_0219. Off', 'airland_0115. Off', 'airland_0218. Off', 'airland_0002. Off', 'airland_0027. Off', 'airland_0566. Off', 'airland_0020. Off', 'airland_0374. Off', 'airland_0613. Off', and 'airland_0276. Off'.]。
To verify the effectiveness of the present invention, a comparison was made with the other 5 advanced algorithms on the disclosed three-dimensional model dataset ModelNet40, and FIG. 6 shows the subject's working characteristics curves (Receiver OperatingCharacteristic Curve, ROC curve for short) for each algorithm, with the ROC curve on the ordinate being true positive (TPR sensitivity) and on the abscissa being false positive (FPR specificity). The abscissa is the false positive rate (False positive rate, FPR), the proportion of samples predicted to be positive but actually negative to all negative samples; on the ordinate, true positive rate (True positive rate, TPR), the proportion of samples predicted to be positive and actually positive to all positive samples. The closer the point on the curve is to the upper left the higher the true rate, the lower the false positive rate, the stronger the algorithm discrimination and the better the performance. From the results in the figure, the three-dimensional model retrieval method based on the view and the hash algorithm has excellent performance.
In the above embodiments, the AlexNet convolutional neural network, the modenet 40 dataset, the TensorFlow deep learning framework, the Relu activation function, the dropout layer, and the sigmod activation function are well known in the art.
The foregoing is a detailed description of embodiments of the invention, when read in conjunction with the accompanying drawings, with the description of the embodiments set forth above being only intended to facilitate a better understanding of the method of the invention. It will be understood by those skilled in the art that various modifications and equivalents may be made thereto without departing from the spirit and scope of the invention as defined by the appended claims.
The invention is applicable to the prior art where it is not described.

Claims (6)

1. A three-dimensional model retrieval method based on view and hash algorithm comprises the steps of obtaining a plurality of view pictures shot by different angles of different three-dimensional models, and normalizing;
constructing a convolutional neural network based on AlexNet: connecting two full-connection layers through a view layer after 5 convolutional layers, adding a hash layer after the last full-connection layer, converting high-dimensional characteristics into low-dimensional hash codes, and designing a quantization loss function in the conversion process to reduce quantization errors of the hash codes;
training a convolutional neural network based on AlexNet by using an existing three-dimensional model data set, wherein the characteristics of each model are represented by hash characteristics learned by the trained network; calculating the similarity between any given query three-dimensional model and the three-dimensional model in the three-dimensional model database by utilizing the Hamming distance, wherein the larger the Hamming distance is, the more dissimilar the model is represented, the smaller the Hamming distance is, the more similar the model is represented, the sorting is carried out from small to large according to the Hamming distance, and a plurality of models with the forefront sorting are selected as results and output to a retrieval list;
the view layer is that the feature maximum value of each dimension of each picture is selected from a plurality of pictures of the same three-dimensional model after the feature extraction of 5 layers of convolution layers, and a feature descriptor of the three-dimensional model is generated and input into a full-connection layer for processing;
the high-dimensional characteristics output by the full connection layer are transcoded into low-dimensional hash characteristics f through the hash layer n Then hash feature f n According to b n =sgn(f n -0.5) into a discrete hash code b n The method comprises the steps of carrying out a first treatment on the surface of the Quantization loss function L in conversion process ql Is thats.t.b n ∈{0,1} k N is the number of samples entered and k is the number of hash code bits.
2. The retrieval method according to claim 1, wherein model scale standardization processing is performed on different three-dimensional models before a plurality of view pictures are acquired, and the scaling processing is performed on the models to scale the models with different scales into cubes with side lengths of 2, and the specific steps are as follows:
1) Reading information of each point of the three-dimensional model, and finding a coordinate point (x min ,y min ,z min ) And the maximum coordinate point (x max ,y max ,z max );
2) Calculating the difference value between the maximum coordinate point and the minimum coordinate point, taking the maximum value of the difference values in three dimensions as the side length l of the model bounding box, constructing a cube bounding box, and placing the center of the model on the cube center of the cube;
3) Scaling the model to obtain a standardized model: the coordinates (x, y, z) of any point are scaled to obtain new coordinates (x ', y ', z '), and the specific calculation method is as follows:
x′=(x-x min )×2/l-1
y′=(y-y min )×2/l-1
z′=(z-z min )×2/l-1
after normalization processing, the coordinates of all points of the model are positioned in [ -1,1], and the model is positioned in a cube with the side length of 2, so as to obtain a normalized model.
3. The retrieval method according to claim 2, wherein the multi-view picture acquisition process is: the virtual camera array is arranged around the models, each model shoots 12 view pictures, and the normalized multiple view pictures are processed into uniform size and then used as input of the convolutional neural network; the method comprises the following specific steps:
1) Placing the standardized model in the body center of the regular icosahedron, and placing virtual cameras at 12 vertexes of the regular icosahedron for shooting to obtain a group of 12 views with 256 multiplied by 256 of the model;
2) The multi-view of the model is cut into the size of 227×227, and the cutting method is as the input of the convolutional neural network:
left=C w /2-C′ w /2
top=C h /2-C′ h /2
right=left+C′ w
bottom=top+C′ h
wherein top, bottom, left, right respectively represent a new size (C' w ,C′ h ) In the original size (C) w ,C h ) Upper, lower, left and right boundaries of the clipping in (a).
4. The retrieval method according to claim 3, wherein the specific structure of the AlexNet-based convolutional neural network is:
1) Sequentially inputting 12 views with the size of 227×227 of all models into a convolutional neural network, and firstly, acquiring local features of an image by using a convolutional pooling layer, wherein the specific settings of the convolutional layer and the pooling layer are as follows:
the first layer comprises a convolution layer and a maximum pooling layer, the convolution layer has a convolution kernel size of 11 multiplied by 11, the step length is 4, and an activation function is set as a Relu function; then carrying out pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step length is 2;
the second layer comprises a convolution layer and a maximum pooling layer, the convolution layer has a convolution kernel size of 5 multiplied by 5, the step length is 1, and an activation function is set as a Relu function; then carrying out pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step length is 2;
the third layer comprises a convolution layer, the convolution kernel of the convolution layer has the size of 3 multiplied by 3, the step length is 1, and the activation function is set as a Relu function;
the fourth layer comprises a convolution layer, the convolution kernel of the convolution layer has the size of 3 multiplied by 3, the step length is 1, and the activation function is set as a Relu function;
the fifth layer comprises a convolution layer and a maximum pooling layer, the convolution layer has a convolution kernel size of 3 multiplied by 3, the step length is 1, and an activation function is set as a Relu function; then carrying out pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step length is 2;
2) After the convolution layer treatment, adding a view layer after the fifth layer convolution layer, carrying out view layer treatment on the characteristics of the 12 pictures of each three-dimensional model after the convolution treatment, comparing the 12 pictures by the view layer to obtain the characteristic maximum value of each dimension of each picture, generating a characteristic descriptor of the three-dimensional model, and inputting the characteristic descriptor into a full-connection layer for treatment; setting 4096 neurons in 2 layers of the full-connection layer, adding a Relu activation function to avoid gradient disappearance, and adding a dropout layer to randomly set the value of the neurons to 0;
3) Adding a hash layer after the full connection layer, wherein the hash layer contains k hidden layer neurons, namely the bit number of the hash code, and setting a sigmod activation function; mapping 4096-dimensional features of full connection layer output to low-dimensional space to form low-dimensional hash features f n Further converting it into discrete hash code b n The conversion process is b n =sgn(f n -0.5); at the same time design quantization loss function L ql Is that
Wherein b n ∈{0,1} k N is the number of samples entered.
5. The retrieval method according to claim 1, wherein when training a network, the disclosed princeton three-dimensional model data set ModelNet40 is used, the training set data is input into a convolutional neural network based on AlexNet for training after model scale standardization processing and multi-view picture normalization processing, and network parameters are optimized to generate a network model; and then testing the model test set by using the generated network model.
6. The retrieval method according to claim 1, wherein the hamming distance calculation process is:
the Hash code characteristics corresponding to the characteristics of each model are obtained, and the Hamming distance calculation method is as followsb i ,b j For the hash characteristics of the two models, +.>Is an exclusive-or operation; for any query three-dimensional model Q, similarity measurement is carried out between the query three-dimensional model Q and the three-dimensional model in the three-dimensional model database M, and the model Q is matched * The calculation process of (1) is as follows:
S(Q,M)=arg min D(b i ,b j )
s represents similarity between models, M m Represents the m-th model (m is more than or equal to 1 and less than or equal to N) in the database * ),N * Is the number of samples in the database; and finally outputting the 10 models with the highest similarity with the models to a retrieval list as results through the calculation.
CN202010418065.3A 2020-05-18 2020-05-18 Three-dimensional model retrieval method based on view and hash algorithm Active CN111597367B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010418065.3A CN111597367B (en) 2020-05-18 2020-05-18 Three-dimensional model retrieval method based on view and hash algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010418065.3A CN111597367B (en) 2020-05-18 2020-05-18 Three-dimensional model retrieval method based on view and hash algorithm

Publications (2)

Publication Number Publication Date
CN111597367A CN111597367A (en) 2020-08-28
CN111597367B true CN111597367B (en) 2023-11-24

Family

ID=72182555

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010418065.3A Active CN111597367B (en) 2020-05-18 2020-05-18 Three-dimensional model retrieval method based on view and hash algorithm

Country Status (1)

Country Link
CN (1) CN111597367B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032613B (en) * 2021-03-12 2022-11-08 哈尔滨理工大学 Three-dimensional model retrieval method based on interactive attention convolution neural network
CN115294284B (en) * 2022-10-09 2022-12-20 南京纯白矩阵科技有限公司 High-resolution three-dimensional model generation method for guaranteeing uniqueness of generated model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932314A (en) * 2018-06-21 2018-12-04 南京农业大学 A kind of chrysanthemum image content retrieval method based on the study of depth Hash
CN108984642A (en) * 2018-06-22 2018-12-11 西安工程大学 A kind of PRINTED FABRIC image search method based on Hash coding
CN109165306A (en) * 2018-08-09 2019-01-08 长沙理工大学 Image search method based on the study of multitask Hash
CN109783682A (en) * 2019-01-19 2019-05-21 北京工业大学 It is a kind of based on putting non-to the depth of similarity loose hashing image search method
CN109918528A (en) * 2019-01-14 2019-06-21 北京工商大学 A kind of compact Hash code learning method based on semanteme protection

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512273A (en) * 2015-12-03 2016-04-20 中山大学 Image retrieval method based on variable-length depth hash learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932314A (en) * 2018-06-21 2018-12-04 南京农业大学 A kind of chrysanthemum image content retrieval method based on the study of depth Hash
CN108984642A (en) * 2018-06-22 2018-12-11 西安工程大学 A kind of PRINTED FABRIC image search method based on Hash coding
CN109165306A (en) * 2018-08-09 2019-01-08 长沙理工大学 Image search method based on the study of multitask Hash
CN109918528A (en) * 2019-01-14 2019-06-21 北京工商大学 A kind of compact Hash code learning method based on semanteme protection
CN109783682A (en) * 2019-01-19 2019-05-21 北京工业大学 It is a kind of based on putting non-to the depth of similarity loose hashing image search method

Also Published As

Publication number Publication date
CN111597367A (en) 2020-08-28

Similar Documents

Publication Publication Date Title
Kazmi et al. A survey of 2D and 3D shape descriptors
Yang et al. Content-based 3-D model retrieval: A survey
Zhong Intrinsic shape signatures: A shape descriptor for 3D object recognition
Elad et al. On bending invariant signatures for surfaces
CN108875813B (en) Three-dimensional grid model retrieval method based on geometric image
CN106844620B (en) View-based feature matching three-dimensional model retrieval method
CN105205135B (en) A kind of 3D model retrieval methods and its retrieval device based on topic model
Guo et al. 3D free form object recognition using rotational projection statistics
CN110543581A (en) Multi-view three-dimensional model retrieval method based on non-local graph convolution network
CN111597367B (en) Three-dimensional model retrieval method based on view and hash algorithm
CN114067075A (en) Point cloud completion method and device based on generation of countermeasure network
Zeng et al. Multi-feature fusion based on multi-view feature and 3D shape feature for non-rigid 3D model retrieval
CN111797269A (en) Multi-view three-dimensional model retrieval method based on multi-level view associated convolutional network
JP7075654B2 (en) 3D CAD model partial search method and 3D CAD model search method
CN106951501B (en) Three-dimensional model retrieval method based on multi-graph matching
CN117237643A (en) Point cloud semantic segmentation method and system
Wang et al. Ovpt: Optimal viewset pooling transformer for 3d object recognition
Liu et al. Deep learning of directional truncated signed distance function for robust 3D object recognition
KR102129060B1 (en) Content-based 3d model retrieval method using a single depth image, 3d model retrieval server for performing the methods and computer readable recording medium thereof
EP2852918B1 (en) Method and apparatus for generating shape descriptor of a model
Guan et al. View-based 3D model retrieval by joint subgraph learning and matching
Ren et al. A multi-scale UAV image matching method applied to large-scale landslide reconstruction
CN111414802B (en) Protein data characteristic extraction method
CN109658489B (en) Three-dimensional grid data processing method and system based on neural network
CN109272013B (en) Similarity measurement method based on learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant