CN109063139B - Three-dimensional model classification and retrieval method based on panorama and multi-channel CNN - Google Patents

Three-dimensional model classification and retrieval method based on panorama and multi-channel CNN Download PDF

Info

Publication number
CN109063139B
CN109063139B CN201810879211.5A CN201810879211A CN109063139B CN 109063139 B CN109063139 B CN 109063139B CN 201810879211 A CN201810879211 A CN 201810879211A CN 109063139 B CN109063139 B CN 109063139B
Authority
CN
China
Prior art keywords
model
dimensional
scale
panorama
view
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810879211.5A
Other languages
Chinese (zh)
Other versions
CN109063139A (en
Inventor
梁祺
聂为之
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201810879211.5A priority Critical patent/CN109063139B/en
Publication of CN109063139A publication Critical patent/CN109063139A/en
Application granted granted Critical
Publication of CN109063139B publication Critical patent/CN109063139B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a three-dimensional model view extraction method based on a panoramic image and a multichannel CNN, which comprises the following steps: projecting the 3D model to the side of a cylinder satisfying a preset conditionTaking the origin of the 3D model as a center, and enabling the axis of the 3D model to be parallel to one of the main axes of X, Y, Z to obtain an initial panoramic image; angles to the surface of a 3D model in three-dimensional space at a certain predetermined rate, respectively
Figure DDA0001754064510000011
Sampling the y coordinate to obtain two groups of values of each point in the initial panoramic image so as to represent the position characteristic of the surface of the 3D model in the three-dimensional space and the direction characteristic of the surface of the 3D model; and constructing a multi-scale network and a multi-channel convolution neural network, taking the position characteristics of the surface of the 3D model and the direction characteristics of the surface of the 3D model as input, and carrying out network training and similarity measurement between two different 3D models. The invention reserves the local and global information of the structure and vision of the three-dimensional model, automatically calculates the characteristics of the 2D panoramic view, and is used for processing the classification and retrieval problems.

Description

Three-dimensional model classification and retrieval method based on panorama and multi-channel CNN
Technical Field
The invention relates to the field of three-dimensional model classification and retrieval, in particular to a three-dimensional model classification and retrieval method based on a panoramic image and a CNN multichannel CNN.
Background
With the development of computer vision technology, 3D technology is widely used in the fields of the film and television industry, mechanical design, construction industry, infrastructure, entertainment industry, medical treatment, and the like. More and more people are starting to upload self-designed 3D models on some websites, and the number of 3D models is on a growing trend. This has led to the search of 3D models as a topical topic in the field of computer vision. Unlike the conventional visual representation of two-dimensional image information, a three-dimensional model has not only visual information but also structural information. Therefore, conventional computer vision techniques are difficult to use to represent 3D models. In recent years, many approaches have been proposed to address the problem of 3D model representation.
In general, 3D model retrieval methods are largely divided into two categories, model-based methods and view-based methods[1]
Early methods generally belonged to model-based methods, which required explicit 3D model data retrieval. Popular model-based methods typically utilize geometric moments[2]Surface distribution of[3]Three-dimensional model[4]The shape of the representation is described. However, the extraction of the structure information is computationally expensive and its performance is highly limited by the sampling structure points. Therefore, the practical application of the model-based approach is severely limited.
View-based approaches have attracted more attention in recent years because they represent a 3D model with a set of 2D images. Many sophisticated computer vision techniques can be used directly to process representations of 3D models, and many classical approaches have also been proposed[5][6]. However, the biggest problem with the view-based approach is that it ignores the structural and spatial information of the three-dimensional model.
In recent years, with the development of deep learning, many researchers begin to utilize some classical deep learning methods to deal with the three-dimensional model retrieval problem. A number of trending topics have been proposed. Maturana et al[7]A novel three-dimensional convolutional neural network based on a CNN classical architecture is provided. It can extract effective feature vectors from the structural information. Su et al[8]A novel CNN network (MVCNN) is proposed to process multi-view information based 3D model representations. In network processing, it can fuse multi-view information to provide robust functionality. Kanezaki et al[9]An improved CNN network is proposed to handle the three-dimensional model classification and retrieval problem, the model being designed to use only a partial set of multi-view images for reasoning and feature learning. Charles et al[10]A novel neural network (PointNet) that directly consumes point clouds is presented. However, this method is only applicable to point cloud type, which limits its application range. Wu et al[11]A deep belief network was trained on the shapes discretized into a 303 voxel grid for object classification, shape completion and sub-optimal view prediction. Sedaghat et al[12]Introduces a kind of auxiliary directional loss, compared with original VoxNet, the classification performance is improved[7]. In general, all of these methods generally focus on structural or visual information, while ignoring the other, affecting classification and search accuracy.
Disclosure of Invention
The invention provides a three-dimensional model classification and retrieval method based on a panoramic image and a multichannel CNN (CNN). The invention reserves the local and global information of the structure and vision of the three-dimensional model, automatically calculates the characteristics of a 2D panoramic image, is used for processing the classification and retrieval problems, and is described in detail as follows:
a method for extracting a three-dimensional model view based on a panorama and a multichannel CNN (CNN), comprising the following steps of:
projecting the 3D model onto the side surface of the cylinder meeting the preset condition, and taking the original point of the 3D model as the center, and enabling the axis of the 3D model to be parallel to one of the main axes of X, Y, Z to obtain an initial panoramic view;
by taking an angle in a plane formed by any two coordinate axes
Figure BDA0001754064490000021
Angles to the surface of a 3D model in three-dimensional space at a certain predetermined rate, respectively
Figure BDA0001754064490000022
Sampling with y coordinate to obtain each point in initial panorama
Figure BDA0001754064490000023
To represent the position characteristics of the 3D model surface in three-dimensional space and the orientation characteristics of the 3D model surface; and constructing a multi-scale network and a multi-channel convolution neural network, taking the position characteristics of the surface of the 3D model and the direction characteristics of the surface of the 3D model as input, and carrying out network training and similarity measurement between two different 3D models.
Further, the preset conditions are as follows: the height of the cylinder is 2 times of the radius of the bottom surface. The preset rate is as follows: at rates 2B and B diagonal
Figure BDA0001754064490000024
And the y coordinate.
Wherein the multi-scale network comprises: extracting view descriptors of different resolutions of the same input picture respectively, wherein the size of the input picture is 256 × 256;
for the first scale, the size is 256 × 256, feature mapping is obtained through the convolution layer of VGG16, and 4096-dimensional feature mapping is obtained through normalization processing;
for the second scale, converting the scale of the input picture into 128 × 128, performing down-sampling, obtaining the feature mapping of the low-resolution picture through the convolution layer, and obtaining 3072-dimensional feature mapping through the maximum pooling layer and normalization processing;
for the third scale, converting the scale of the input picture into 64 x 64, performing down-sampling, obtaining the feature mapping of the low-resolution picture through the convolution layer, and obtaining 3072-dimensional feature mapping through the maximum pooling layer and normalization processing;
performing linear fusion on the outputs of the three scales to obtain a 4096-dimensional characteristic diagram, then obtaining a view descriptor through a full connection layer, and obtaining a classified result vector through a dropout layer and a softmax layer;
finally, the softmax layer outputs the class probability given the input 3D model, the class with the highest probability is considered the predicted class of the 3D model, trained using a random gradient descent method with momentum set to 0.9.
The multi-channel convolutional neural network comprises 6 channels,
the branch channel is used for creating a branch channel which is segmented according to the 3 axes of the panoramic view, and for the classification task, the probability vector is calculated by taking the mean value of all three individual probability vectors;
each 3D model has 6 descriptors, three of which are spatial distribution descriptors on XYZ axes and are used for describing the position characteristics of the surface of the 3D model;
the other three are normal vector distribution descriptors on XYZ axes, namely the direction characteristics of the 3D model surface;
each 3D model descriptor is compared to the remaining 3D model descriptors using an L1 distance metric for these 6 descriptors.
The technical scheme provided by the invention has the beneficial effects that:
1. according to the invention, each 3D model is represented by using the multi-resolution panoramic view, so that the structure of the 3D model can be effectively represented;
2. the invention provides a novel multi-channel CNN network for extracting visual characteristic vectors of panoramic views, convolution kernels of different scales are applied in the multi-channel CNN network, local and global information of the panoramic views can be stored, and the robustness of the characteristic vectors is improved.
Drawings
FIG. 1 is a flow chart of a method for classification and retrieval of three-dimensional models based on panoramas and multichannel CNNs;
FIG. 2 is a schematic representation of a 3D model and SDM (space distribution map), NDM (normal deviation map) images on three axes;
FIG. 3 is a schematic diagram of a Multi-Scale-NN (Multi-Scale neural network) architecture;
fig. 4 is a schematic diagram of a Multi-Channel-NN (Multi-Channel neural network) architecture.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
Example 1
To solve the above problem, a multi-resolution panorama view needs to be extracted from each 3D model. The research shows that: the panoramic view of the 3D model can convert the structural information of the 3D model into 2D image information[13]. The embodiment of the invention provides a three-dimensional model view extraction method based on a panoramic image and a multichannel CNN (CNN), and the method is described in detail in the following description with reference to fig. 1 and 2:
101: obtaining an initial panorama by projecting the 3D model surface in fig. 2 onto the side of a cylinder of radius R and height H2R, centered at the 3D model origin in fig. 2, with the axis of the 3D model parallel to one of the principal axes of X, Y, Z of space;
wherein the value of R is set to 3 x dmax,dmaxIs the maximum distance of the 3D model surface from the centroid.
102: assuming that a Z-axis panorama is extracted, embodiments of the present invention use a set of points
Figure BDA0001754064490000041
Parameterizing the initial panorama;
wherein the content of the first and second substances,
Figure BDA0001754064490000042
is the angle in the xy-plane and,at rates 2B and B, respectively
Figure BDA0001754064490000043
And the y coordinate. In the present embodiment, B-32, 64,128 is set. This means that each axis needs to be sampled three times. Each point in the initial panorama is then determined
Figure BDA0001754064490000044
Represents two different features of the 3D model surface, respectively:
(1) the position of the model surface in three-dimensional space (called the spatial map or SDM);
(2) the orientation of the model surface (called the normal deviation map or NDM).
Thus, for each axis in each 3D model, a panorama of 6 different scales and different values can be obtained, as shown in fig. 2. FIG. 2 depicts on the left a spatial distribution map, obtained centering around three axes respectively, representing the position of the mold surface in three-dimensional space; the right side depicts normal deviation maps obtained centering around the three axes, respectively, to represent the orientation of the model surface.
In conclusion, the embodiment of the invention avoids the loss of the structure and space information of the three-dimensional model caused by the traditional method, thereby improving the classification and retrieval accuracy.
Example 2
The training of the multi-channel and multi-scale CNN networks and the similarity measure between two different 3D models are described below with reference to specific network structures, calculation formulas, fig. 3, and fig. 4, and will be described in detail in the following description:
fig. 3 shows a multi-scale network, which includes three scales, and extracts view descriptors of different resolutions of the same input picture, assuming that the size of the input picture is 256 × 256, and for the first scale, the size of the input picture is the same as that of the original picture, and the size of the input picture is 256 × 256, and the input picture directly passes through a VGG16 convolutional neural network to obtain feature mapping, and then obtains 4096-dimensional feature mapping through normalization processing;
for the second scale, converting the input picture into 128 × 128 to enable the generated picture resolution to be 1/2 of the original picture resolution, then performing down-sampling, obtaining feature mapping of a low-resolution picture through a convolutional neural network, performing maximum pooling layer processing, and performing normalization processing to obtain 3072-dimensional feature mapping;
for the third scale, converting the input picture into 64 x 64 so that the resolution of the generated picture is 1/4 of the resolution of the original picture, then carrying out down-sampling, then obtaining the feature mapping of the low-resolution picture through a convolutional neural network, carrying out maximum pooling layer processing, and then carrying out normalization processing to obtain 3072 dimensional feature mapping;
performing linear fusion on the outputs of the three scales to obtain a 4096-dimensional characteristic diagram, then obtaining a view descriptor through a full connection layer, and obtaining a classified result vector through a dropout layer and a softmax layer;
finally, the softmax layer outputs the class probability given the input 3D model, the class with the highest probability is considered the prediction class of the 3D model, and the network is trained using a stochastic gradient descent method with momentum set to 0.9.
Fig. 4 shows a multi-channel convolutional neural network, which includes 6 channels, and extracts 6 different panoramas of a three-dimensional model as input, where the initial resolution of an input picture of each channel is 128 × 256, then the input picture of each channel passes through a multi-scale network to obtain vectors of predicted classification results, and then the vectors are weighted and averaged to obtain a final classification result vector. The network aims to create a branching channel that is segmented according to the 3-axis of the panoramic view. For the classification task, the probability vector is calculated by taking the mean of all three individual probability vectors. The descriptors of the retrieval task consist of the activation of the last fully-connected layer of the convolutional neural network.
Therefore, each 3D model has 6 descriptors, three of which are spatial distribution descriptors on XYZ axes, and are used to describe the position characteristics of the surface of the 3D model; the other three are normal vector distribution descriptors on XYZ axes, direction features of the 3D model surface. Each 3D model descriptor is compared to the remaining 3D model descriptors using an L1 distance metric (e.g., equation). Due to its linearity, the L1 distance is used, which emphasizes the differences between the components of the descriptor vector.
Figure BDA0001754064490000051
Where Q and M represent each 3D model. i and j are indices of the panoramic view. f is the feature vector of the panoramic view extracted by the multi-scale convolutional network, as shown in fig. 3. According to the distance between Q and M, the similarity between two different models can be easily obtained, and the 3D model retrieval task can be processed.
In summary, the embodiments of the present invention avoid the influence of the loss of structural and spatial information on the classification and retrieval of the three-dimensional model caused by representing a 3D model by a group of 2D images, and improve the accuracy of the classification and retrieval.
Example 3
The results of the protocol of examples 1 and 2 are verified below with reference to the specific data set, table 1 and table 2, and are described in detail below:
the dataset used to evaluate the proposed classification method is the primston model net large 3d cad model dataset. ModelNet consists of 127,915 CAD models, grouped into 662 object classes, into two subsets, ModelNet-10 and ModelNet-40, both of which contain training and testing partitions.
1) ModelNet10 consists of 4899 CAD models, classified into 10 classes. For convenience of processing, the models are adjusted by placing the center of mass of the model at the origin of the coordinates and normalizing in terms of translation and rotation.
The training and testing subsets of the ModelNet10 are composed of 3991 and 908 models, respectively.
2) ModelNet-40 contains 12,311 CAD models, divided into 40 categories. For ease of handling, these models are adjusted to place the centroid of the model at the origin of the coordinates, but are not normalized.
The training and testing subsets of ModelNet-40 are composed of 9843 and 2468 models, respectively.
Hair brushThe MSMC-NN proposed in the example was evaluated on the classification task of the test subset of ModelNet-10 and ModelNet-40. Performance is measured by the average binary classification accuracy (a value of 1 corresponds to the case where the class of the test 3D model is correctly predicted, otherwise 0). The comparison was origi Light Field[15](LFD,4700dimensions) and Spherical Harmonics[16](SPH,544dimensions), machine learning is not used in order to set evaluation criteria. Also included are 3d hash nets using recently machine learning methods[17](V),the DeepPano descriptor[18],the Geometry Image descriptor[19]. In addition to the above described competition methods, the results were extended to include the following techniques: GIFT[20],ORION(V),Set-convolution[21],3D-GAN[22](V),Vox Net[7](V), Garcia-Garcia et al Point Net method[23](Point Net-Garcia), Xu and Todorovic[24](V). The scores of the above competition methods are the scores reported by the authors in the respective papers. Table 1 summarizes the scores of the above methods. The corresponding experimental results are shown in the table.
TABLE 1 Classification accuracy of ModelNet-10 and ModelNet-40 datasets.
(V) the representation method adopts voxel representation; (NONML) shows no involvement of machine learning.
Figure BDA0001754064490000061
Figure BDA0001754064490000071
From Table 1, it can be seen that the proposed method outperforms all of the above methods in the challenging ModelNet-40 dataset and ModelNet-10. It is clear that the method using voxel representation generally performs better than the method using image representation. This can be demonstrated by the richer information contained in the 3D volume data about the 2D representation. However, despite the use of image representations, the proposed method can outperform previous methods. Meanwhile, only MC-NN is applied to handle the 3D classification problem. The corresponding results also indicate that MSMC-NN gave better results than MC-NN.
Another evaluation of the proposed method is performed on the task of 3D model retrieval. The performance of this method was measured on the ModelNet-10 and ModelNet-40 datasets, compared to the method that provided the search results and the GIFT method. On the ModelNet dataset, retrieval accuracy is measured by the average retrieval accuracy (mAP). The comparisons were made with original Light fields (LFD,4700dimensions) and the topical Harmonics (SPH,544dimensions), and machine learning was not used to set the evaluation criteria. Also included are the 3D halftones, the deep Pano descriptor, the Geometry Image descriptor that have recently used machine learning methods. The scores of the above competition methods are the scores reported by the authors in the respective papers. For the ModelNet dataset, table 2 shows the results of the search experiment, where the proposed method outperforms the competition method described above.
Table 2ModelNet-10 and ModelNet-40 dataset average retrieval accuracy (mAP) (NONML) shows that no machine learning is involved.
Figure BDA0001754064490000072
Figure BDA0001754064490000081
In summary, the embodiments of the present invention illustrate that the addition of a multi-scale panorama view helps to improve performance, and the panorama, in addition to being a good shape descriptor, also makes up the gap between the initial 3D model representation and the 2D input, which is generally more suitable for convolutional neural networks. Related experiments also show the effectiveness of the panoramic view, so that the classification and retrieval accuracy is improved, and the effectiveness of design is shown.
Reference documents:
[1]Anan Liu,Zhongyang Wang,Weizhi Nie,and Yuting Su.Graph-based characteristic view set extraction and matching for 3d model retrieval.Information Sciences,320:429–442,2015.
[2]Luren Yang and Fritz Albregtsen.Fast and exact computation of cartesian geometric moments using discrete green’s theorem.Pattern Recognition,29(7):1061–1073,1996.
[3]Ke Lu,Qian Wang,Jian Xue,and Weiguo Pan.3d model retrieval and classification by semi-supervised learning with content-based similarity.Information Sciences,281:703–713,2014.
[4]PRZEMYSLAW Polewski,W Yao,MARCO Heurich,PETER Krzystek,and U Stilla.Detection of fallen trees in als point clouds of a temperate forest by combining point/primitive-level shape descriptors.Gemeinsame Tagung,2014.
[5]Wei-Zhi Nie,An-An Liu,and Yu-Ting Su.3d object retrieval based on sparse coding in weak supervision.Journal of Visual Communication and Image Representation,37:40–45,2016.
[6]Biao Leng,Xiangyang Zhang,Ming Yao,and Zhang Xiong.A 3d model recognition mechanism based on deep boltzmann machines.Neurocomputing,151:593–602,2015.
[7]Daniel Maturana and Sebastian Scherer.Voxnet:A 3d convolutional neural network for real-time object recognition.In Intelligent Robots and Systems(IROS),2015IEEE/RSJ International Conference on,pages 922–928.IEEE,2015.
[8]Hang Su,Subhransu Maji,Evangelos Kalogerakis,and Erik Learnedmiller.Multiview convolutional neural networks for 3d shape recognition.pages 945–953,2016.
[9]Asako Kanezaki,Yasuyuki Matsushita,and Yoshifumi Nishida.Rotationnet:Joint object categorization and pose estimation using multiviews from unsupervised viewpoints.2016.
[10]R.Qi Charles,Su Hao,Kaichun Mo,and Leonidas J.Guibas.Pointnet:Deep learning on point sets for 3d classification and segmentation.In IEEE Conference on Computer Vision and Pattern Recognition,pages 77–85,2017.
[11]Zhirong Wu,Shuran Song,Aditya Khosla,Fisher Yu,Linguang Zhang,Xiaoou Tang,and Jianxiong Xiao.3d shapenets:A deep representation for volumetric shapes.pages 1912–1920,2014.
[12]Nima Sedaghat,Mohammadreza Zolfaghari,Ehsan Amiri,and Thomas Brox.Orientation-boosted voxel nets for 3d object recognition.arXiv preprint arXiv:1604.03351,2016.
[13] panagiotis Papadaikis, Ioanis Pratikakis, Theoharis Theoharis, and Stavros permantons. Panorac views A3 d shape descriptor based on panoramic views for unsupervised 3d object retrieval International Journal of Computer Vision 89(2-3): 177-.
[14]Karen Simonyan and Andrew Zisserman.Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556,2014.
[15]Ding-Yun Chen,Xiao-Pei Tian,Yu-Te Shen,and Ming Ouhyoung.On visual similarity based 3d model retrieval.In Computer graphics forum,volume 22,pages 223–232.Wiley Online Library,2003.
[16]Michael Kazhdan,Thomas Funkhouser,and Szymon Rusinkiewicz.Rotation invariant spherical harmonic representation of 3 d shape descriptors.In Symposium on geometry processing,volume 6,pages 156–164,2003.
[17]Zhirong Wu,Shuran Song,Aditya Khosla,Fisher Yu,Linguang Zhang,Xiaoou Tang,and Jianxiong Xiao.3d shapenets:A deep representation for volumetric shapes.In Proceedings of the IEEE conference on computer vision and pattern recognition,pages 1912–1920,2015.
[18]Baoguang Shi,Song Bai,Zhichao Zhou,and Xiang Bai.Deeppano:Deep panoramic representation for 3-d shape recognition.IEEE Signal Processing Letters,22(12):2339–2343,2015.
[19]Ayan Sinha,Jing Bai,and Karthik Ramani.Deep learning 3d shape surfaces using geometry images.In European Conference on Computer Vision,pages 223–240.Springer,2016.
[20]Song Bai,Xiang Bai,Zhichao Zhou,Zhaoxiang Zhang,and Longin Jan Latecki.
Gift:A real-time and scalable 3d shape search engine.In Computer Vision and Pattern Recognition(CVPR),2016IEEE Conference on,pages 5023–5032.IEEE,2016.
[21]Siamak Ravanbakhsh,Jeff Schneider,and Barnabas Poczos.Deep learning with sets and point clouds.arXiv preprint arXiv:1611.04500,2016.
[22]Jiajun Wu,Chengkai Zhang,Tianfan Xue,Bill Freeman,and Josh Tenenbaum.Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling.In Advances in Neural Information Processing Systems,pages 82–90,2016.
[23]Alberto Garcia-Garcia,Francisco Gomez-Donoso,Jose Garcia-Rodriguez,Sergio Orts-Escolano,Miguel Cazorla,and J Azorin-Lopez.Pointnet:A 3d convolutional neural network for real-time object class recognition.In Neural Networks(IJCNN),2016International Joint Conference on,pages 1578–1584.IEEE,2016.
[24]Xu Xu and Sinisa Todorovic.Beam search for learning a deep convolutional neural network of 3d shapes.In Pattern Recognition(ICPR),201623rd International Conference on,pages3506–3511.IEEE,2016.
In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (5)

1. A method for extracting a three-dimensional model view based on a panorama and a multichannel CNN is characterized by comprising the following steps:
projecting the 3D model onto the side surface of the cylinder meeting the preset condition, and taking the original point of the 3D model as the center, and enabling the axis of the 3D model to be parallel to one of the main axes of X, Y, Z to obtain an initial panoramic view;
by taking an angle in a plane formed by any two coordinate axes
Figure FDA0003061095260000011
Angles to the surface of a 3D model in three-dimensional space at a certain predetermined rate, respectively
Figure FDA0003061095260000012
Sampling with y coordinate to obtain each point in initial panorama
Figure FDA0003061095260000013
To represent the position characteristics of the 3D model surface in three-dimensional space and the orientation characteristics of the 3D model surface; and constructing a multi-scale network and a multi-channel convolution neural network, taking the position characteristics of the surface of the 3D model and the direction characteristics of the surface of the 3D model as input, and carrying out network training and similarity measurement between two different 3D models.
2. The method for extracting the three-dimensional model view based on the panorama and the multi-channel CNN as claimed in claim 1, wherein the preset conditions are as follows: the height of the cylinder is 2 times of the radius of the bottom surface.
3. The method as claimed in claim 1, wherein the preset rate is: at rates 2B and B diagonal
Figure FDA0003061095260000014
And the y coordinate.
4. The method for extracting three-dimensional model view based on panorama and multi-channel CNN as claimed in claim 1,
the multi-scale network comprises: extracting view descriptors of different resolutions of the same input picture respectively, wherein the size of the input picture is 256 × 256;
for the first scale, the size is 256 × 256, feature mapping is obtained through the convolution layer of VGG16, and 4096-dimensional feature mapping is obtained through normalization processing;
for the second scale, converting the scale of the input picture into 128 × 128, performing down-sampling, obtaining the feature mapping of the low-resolution picture through the convolution layer, and obtaining 3072-dimensional feature mapping through the maximum pooling layer and normalization processing;
for the third scale, converting the scale of the input picture into 64 x 64, performing down-sampling, obtaining the feature mapping of the low-resolution picture through the convolution layer, and obtaining 3072-dimensional feature mapping through the maximum pooling layer and normalization processing;
performing linear fusion on the outputs of the three scales to obtain a 4096-dimensional characteristic diagram, then obtaining a view descriptor through a full connection layer, and obtaining a classified result vector through a dropout layer and a softmax layer;
finally, the softmax layer outputs the class probability given the input 3D model, the class with the highest probability is considered the predicted class of the 3D model, trained using a random gradient descent method with momentum set to 0.9.
5. The panorama and multi-channel CNN-based three-dimensional model view extraction method of claim 1, wherein the multi-channel convolutional neural network comprises 6 channels,
the branch channel is used for creating a branch channel which is segmented according to the 3 axes of the panoramic view, and for the classification task, the probability vector is calculated by taking the mean value of all three individual probability vectors;
each 3D model has 6 descriptors, three of which are spatial distribution descriptors on XYZ axes and are used for describing the position characteristics of the surface of the 3D model;
the other three are normal vector distribution descriptors on XYZ axes, namely the direction characteristics of the 3D model surface;
each 3D model descriptor is compared to the remaining 3D model descriptors using an L1 distance metric for these 6 descriptors.
CN201810879211.5A 2018-08-03 2018-08-03 Three-dimensional model classification and retrieval method based on panorama and multi-channel CNN Active CN109063139B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810879211.5A CN109063139B (en) 2018-08-03 2018-08-03 Three-dimensional model classification and retrieval method based on panorama and multi-channel CNN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810879211.5A CN109063139B (en) 2018-08-03 2018-08-03 Three-dimensional model classification and retrieval method based on panorama and multi-channel CNN

Publications (2)

Publication Number Publication Date
CN109063139A CN109063139A (en) 2018-12-21
CN109063139B true CN109063139B (en) 2021-08-03

Family

ID=64833158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810879211.5A Active CN109063139B (en) 2018-08-03 2018-08-03 Three-dimensional model classification and retrieval method based on panorama and multi-channel CNN

Country Status (1)

Country Link
CN (1) CN109063139B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754006A (en) * 2018-12-26 2019-05-14 清华大学 A kind of view and the stereoscopic vision content categorizing method and system of point cloud fusion
CN110163091B (en) * 2019-04-13 2023-05-26 天津大学 Three-dimensional model retrieval method based on LSTM network multi-mode information fusion
CN110570522B (en) * 2019-08-22 2023-04-07 天津大学 Multi-view three-dimensional reconstruction method
CN110910344B (en) * 2019-10-12 2022-09-13 上海交通大学 Panoramic picture no-reference quality evaluation method, system and equipment
CN111242207A (en) * 2020-01-08 2020-06-05 天津大学 Three-dimensional model classification and retrieval method based on visual saliency information sharing
CN111310670B (en) * 2020-02-19 2024-02-06 江苏理工学院 Multi-view three-dimensional shape recognition method based on predefined and random viewpoints
CN111460193B (en) * 2020-02-28 2022-06-14 天津大学 Three-dimensional model classification method based on multi-mode information fusion
CN111402217B (en) * 2020-03-10 2023-10-31 广州视源电子科技股份有限公司 Image grading method, device, equipment and storage medium
CN112270762A (en) * 2020-11-18 2021-01-26 天津大学 Three-dimensional model retrieval method based on multi-mode fusion
CN116883880B (en) * 2023-09-07 2023-11-28 江苏省特种设备安全监督检验研究院 Crane identification method and device based on AR technology and electronic equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106548516B (en) * 2015-09-23 2021-05-14 清华大学 Three-dimensional roaming method and device
US10115032B2 (en) * 2015-11-04 2018-10-30 Nec Corporation Universal correspondence network
CN106951501B (en) * 2017-03-16 2020-05-12 天津大学 Three-dimensional model retrieval method based on multi-graph matching
CN107967484B (en) * 2017-11-14 2021-03-16 中国计量大学 Image classification method based on multi-resolution
CN107944390B (en) * 2017-11-24 2018-08-24 西安科技大学 Motor-driven vehicle going objects in front video ranging and direction localization method

Also Published As

Publication number Publication date
CN109063139A (en) 2018-12-21

Similar Documents

Publication Publication Date Title
CN109063139B (en) Three-dimensional model classification and retrieval method based on panorama and multi-channel CNN
Ahmed et al. A survey on deep learning advances on different 3D data representations
Li et al. So-net: Self-organizing network for point cloud analysis
Ahmed et al. Deep learning advances on different 3D data representations: A survey
Zhou et al. Voxelnet: End-to-end learning for point cloud based 3d object detection
Zhi et al. LightNet: A Lightweight 3D Convolutional Neural Network for Real-Time 3D Object Recognition.
CN110457515B (en) Three-dimensional model retrieval method of multi-view neural network based on global feature capture aggregation
CN111242207A (en) Three-dimensional model classification and retrieval method based on visual saliency information sharing
CN111460193B (en) Three-dimensional model classification method based on multi-mode information fusion
Hu et al. MAT-Net: Medial Axis Transform Network for 3D Object Recognition.
Zhang et al. Learning rotation-invariant representations of point clouds using aligned edge convolutional neural networks
Liang et al. MVCLN: multi-view convolutional LSTM network for cross-media 3D shape recognition
Zeng et al. Multi-feature fusion based on multi-view feature and 3D shape feature for non-rigid 3D model retrieval
Xuan et al. MV-C3D: A spatial correlated multi-view 3d convolutional neural networks
Wang et al. Multi-view attention-convolution pooling network for 3D point cloud classification
Ding et al. An efficient 3D model retrieval method based on convolutional neural network
Li et al. Deep residual neural network based PointNet for 3D object part segmentation
CN114299339A (en) Three-dimensional point cloud model classification method and system based on regional correlation modeling
Nie et al. The assessment of 3D model representation for retrieval with CNN-RNN networks
Nie et al. Panorama based on multi-channel-attention CNN for 3D model recognition
Zhu et al. Training convolutional neural network from multi-domain contour images for 3D shape retrieval
Chen et al. 3D object classification with point convolution network
Zou et al. A 3D model feature extraction method using curvature-based shape distribution
Ramasinghe et al. Blended convolution and synthesis for efficient discrimination of 3D shapes
Xu et al. Learning discriminative and generative shape embeddings for three-dimensional shape retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant