CN110147460B - Three-dimensional model retrieval method and device based on convolutional neural network and multi-view map - Google Patents
Three-dimensional model retrieval method and device based on convolutional neural network and multi-view map Download PDFInfo
- Publication number
- CN110147460B CN110147460B CN201910329456.5A CN201910329456A CN110147460B CN 110147460 B CN110147460 B CN 110147460B CN 201910329456 A CN201910329456 A CN 201910329456A CN 110147460 B CN110147460 B CN 110147460B
- Authority
- CN
- China
- Prior art keywords
- dimensional model
- image
- rigid
- view
- view angle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The embodiment of the application relates to a three-dimensional model retrieval method and device based on a convolutional neural network and a multi-view diagram, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a plurality of view angle images of the non-rigid three-dimensional model, wherein each view angle image corresponds to a different view angle; extracting convolution characteristics of each view map through a convolution neural network; acquiring the image entropy of each view image, and determining the confidence corresponding to the convolution feature of each view image according to the image entropy; determining the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model according to the convolution characteristics of the multiple view angle images and the corresponding confidence degrees; and retrieving the entity model matched with the non-rigid three-dimensional model according to the similarity. The three-dimensional model retrieval method, the three-dimensional model retrieval device, the electronic equipment and the storage medium based on the convolutional neural network and the multi-view map can more accurately retrieve the three-dimensional model.
Description
Technical Field
The present disclosure relates to the field of three-dimensional design, and in particular, to a method and an apparatus for retrieving a three-dimensional model based on a convolutional neural network and a multi-view map, an electronic device, and a storage medium.
Background
Compared with various media data such as images, videos, sounds and the like, the three-dimensional model and the three-dimensional scene formed by the three-dimensional model can show objective object information more comprehensively and truly, and accord with the perception form of a human visual system, so that the three-dimensional model and the three-dimensional scene are widely applied to the fields of product design, virtual reality, movie animation and the like. The similar models are quickly and accurately retrieved from the model library, so that the task amount of constructing repeated models can be reduced, the management and the use of three-dimensional model materials can be facilitated, and the three-dimensional design is easy and quick. In the current model base construction, a text-based retrieval mode is mostly adopted for three-dimensional model retrieval, and the mode needs manual model labeling. However, by means of artificial subjective judgment, models in the same label may vary greatly, and the retrieval result is not accurate.
Disclosure of Invention
The embodiment of the application provides a three-dimensional model retrieval method and device based on a convolutional neural network and a multi-view diagram, electronic equipment and a storage medium, and the three-dimensional model can be retrieved more accurately.
A three-dimensional model retrieval method based on a convolutional neural network and a multi-view map comprises the following steps:
acquiring a plurality of view angle images of the non-rigid three-dimensional model, wherein each view angle image corresponds to a different view angle;
extracting convolution characteristics of each view map through a convolution neural network;
acquiring the image entropy of each view image, and determining the confidence corresponding to the convolution feature of each view image according to the image entropy;
determining the similarity between the non-rigid three-dimensional model and each entity model in a three-dimensional model library according to the convolution characteristics of the multiple visual angle images and the corresponding confidence degrees;
and retrieving the entity model matched with the non-rigid three-dimensional model according to the similarity.
A three-dimensional model retrieval device based on a convolutional neural network and a multi-view map comprises:
the acquisition module is used for acquiring a plurality of view angle images of the non-rigid three-dimensional model, and each view angle image corresponds to different view angles;
the extraction module is used for extracting the convolution characteristics of each view map through a convolution neural network;
the confidence coefficient determining module is used for acquiring the image entropy of each view image and determining the confidence coefficient corresponding to the convolution feature of each view image according to the image entropy;
the similarity determining module is used for determining the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model library according to the convolution characteristics of the multiple visual angle images and the corresponding confidence coefficients;
and the retrieval module is used for retrieving the entity model matched with the non-rigid three-dimensional model according to the similarity.
An electronic device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to carry out the method as described above.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method as set forth above.
The three-dimensional model retrieval method, the device, the electronic equipment and the storage medium based on the convolutional neural network and the multi-view images are used for acquiring the multiple view images corresponding to different views in the non-rigid three-dimensional model, extracting the convolution characteristics of each view image through the convolutional neural network, acquiring the image entropy of each view image, determining the confidence corresponding to the convolution characteristics of each view image according to the image entropy, and determining the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model library according to the convolution characteristics of the multiple view images and the corresponding confidence, so that the entity model matched with the non-rigid three-dimensional model can be retrieved according to the similarity, and the three-dimensional model can be retrieved more accurately and quickly.
Drawings
FIG. 1 is a flowchart illustrating the steps of a method for retrieving a three-dimensional model based on a convolutional neural network and a multi-view map, according to an embodiment;
FIG. 2 is a schematic diagram of the components of a convolutional neural network in one embodiment;
FIG. 3 is a flowchart illustrating steps taken to obtain multiple perspective views in one embodiment;
FIG. 4 is a schematic diagram illustrating the acquisition of a perspective view of a non-rigid three-dimensional model in one embodiment;
FIG. 5 is a flow diagram of the steps for obtaining image entropy and determining corresponding confidence levels in one embodiment;
FIG. 6 is a block diagram of a three-dimensional model retrieval device based on a convolutional neural network and a multi-view map in one embodiment;
fig. 7 is a block diagram of an electronic device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another. For example, a first client may be referred to as a second client, and similarly, a second client may be referred to as a first client, without departing from the scope of the present application. Both the first client and the second client are clients, but they are not the same client.
As shown in fig. 1, an embodiment of the present application provides a three-dimensional model retrieval method based on a convolutional neural network and a multi-view map, including the following steps:
and 110, acquiring a plurality of view angle images of the non-rigid three-dimensional model, wherein each view angle image corresponds to a different view angle.
The electronic device can acquire multiple view angle diagrams of a non-rigid three-dimensional model, wherein the non-rigid three-dimensional model can refer to a three-dimensional model needing to be searched, and the three-dimensional model can be deformed under various processing actions. The view angle diagram refers to an image formed by rendering the non-rigid three-dimensional model from a specific angle, the acquired view angle diagrams can respectively correspond to different view angles, namely the view angle diagrams of the non-rigid three-dimensional model can be generated by rendering from different view angles, and the acquired view angle diagrams can contain most of characteristic information such as shape, color and the like of the non-rigid three-dimensional model. Compared with two-dimensional image representation modes of other three-dimensional models such as projection and hand-drawn sketches, the view angle diagram is closer to the real existence of the model.
And step 120, extracting the convolution characteristics of each view through a convolution neural network.
The electronic equipment reduces the dimension of the non-rigid three-dimensional model and converts the non-rigid three-dimensional model into a two-dimensional view image which is convenient to process, so that a processing object becomes regular and uniform, and a convolutional neural network with excellent performance in the field of two-dimensional image processing is provided with possibility for full transplantation. The acquired multiple view angle images can be used as a group of two-dimensional description of the non-rigid three-dimensional model, the electronic equipment can input the acquired multiple view angle images into a pre-established convolutional neural network, and the convolutional features of each view angle image are extracted through the convolutional neural network. In some embodiments, the convolutional neural Network adopts VGG16(Visual Geometry Group Network 16) basic architecture, and a 16-layer deep convolutional neural Network is constructed by repeatedly stacking small convolutional kernels, and the convolutional neural Network comprises 13 convolutional layers and 3 fully-connected layers, wherein the convolutional layers can adopt convolutional kernels with the size of 3 × 3, so that the number of parameters of the convolutional neural Network can be reduced, and the complexity of calculation can be reduced.
In some embodiments, the convolutional neural network may be pre-established from an image dataset, and model parameters of the convolutional neural network may be trained from the image dataset, which may contain a large number of two-dimensional images, and may be a large-scale image dataset such as ImageNet. The electronic device can perform preprocessing operation on the image in the image data set, wherein the preprocessing operation can include performing at least one of translation, rotation, scaling, symmetric mapping and the like on the image in the image data set to obtain a preprocessed image, then training the preprocessed image according to the image data set containing the preprocessed image to obtain a convolutional neural network, so that the convolutional neural network can acquire certain translation invariance, rotation invariance, scale transformation invariance and the like, convolution characteristics extracted by the convolutional neural network have transformation invariance, even if a non-rigid three-dimensional model is subjected to translation, rotation and the like, the retrieval effect can still be kept from being reduced, and the retrieval result can be more accurate and stable.
In some embodiments, a plurality of maximum pooling layers may be disposed in the 13 convolutional layers, and for the feature map output from the previous convolutional layer, the maximum pooling layer may select a maximum value of the feature points in the neighborhood, and if values of the remaining feature points in the neighborhood are slightly changed, the result is unchanged after passing through the maximum pooling layer, so that the estimated mean shift caused by parameter errors of the convolutional layers may be reduced. The convolutional neural network can be provided with an activation function, and the activation function can be used for judging whether the characteristic intensity of the corresponding region reaches a certain standard. As an embodiment, the convolutional neural network may employ a ReLU activation function, and the formula of the activation function may be as shown in equation (1):
f(x)=max(0,x) (1)
when the feature intensity of the region is lower than the standard, 0 can be output, which indicates that the feature of the region cannot be extracted, and when the feature intensity reaches the standard, the corresponding feature can be output, and by using the activation function, the region which is not related to the feature does not influence the training of the convolutional neural network.
In some embodiments, a classification layer may be provided after the fully connected layer, and the classification layer may be used to map the multi-dimensional features of the image into an n-dimensional probability distribution.
As a specific embodiment, the formula of the classification layer Softmax layer can be shown as formula (2):
and calculating the probability of the multidimensional characteristics Z in each preset category through a full connection layer and a classification layer, thereby training the convolutional neural network through a gradient descent method.
FIG. 2 is a schematic diagram of the components of a convolutional neural network in one embodiment. As shown in fig. 2, the preprocessed image may be input into the convolutional neural network in a size of 224 × 224, 512 features may be output through 13 convolutional layers and 5 maximum pooling layers arranged in the convolutional layers, 512 features are input into 3 fully-connected layers, and the 512 features are mapped into n-dimensional probability distribution through the last classification layer (i.e., Softmax layer) to determine the class corresponding to the image, so as to train the convolutional neural network.
In one embodiment, after training the convolutional neural network according to the preprocessed image data set, the fully-connected layers and the classification layers of the convolutional neural network may be deleted, only the convolutional layers and the maximum pooling layers are reserved, when the convolutional features of each view of the non-rigid three-dimensional model are extracted through the convolutional neural network, the view may be input into the established convolutional neural network in a size of 224 × 224, and the high-dimensional convolutional features are output through the convolutional layers and the maximum pooling layers, optionally, the features output by the last convolutional layer may be used as the convolutional features of the corresponding view, and then the extracted convolutional features of each view may be 512-dimensional.
And step 130, acquiring the image entropy of each view image, and determining the confidence corresponding to the convolution feature of each view image according to the image entropy.
After extracting the convolution characteristics of each view image through a convolution neural network, the electronic equipment can combine the convolution characteristics of each view image to generate a characteristic descriptor of a non-rigid three-dimensional model for describing the non-rigid three-dimensional model, and can search an entity model matched with the non-rigid three-dimensional model needing to be searched in a three-dimensional model library according to the characteristic descriptor, wherein the three-dimensional model library can store the characteristic descriptors of a plurality of entity models, the characteristic descriptor of the non-rigid three-dimensional model needing to be searched can be compared with the characteristic descriptors of each entity model in the three-dimensional model library, and the similarity between the non-rigid three-dimensional model and each entity model is calculated, so that the matched entity model is searched.
In some embodiments, because each view map contains different information content, and the reliability of the similarity of the feature characterization is different, it is necessary to assign a confidence to the feature of each view map, and the confidence can be used to represent the reliability of the convolution feature of the corresponding view map, and the greater the confidence, the higher the reliability. The entropy of an image represents the average number of bits of a set of image gray levels, describing the average amount of information of the image. The electronic equipment can acquire the image entropy of each view map, and determines the confidence corresponding to the convolution characteristics of each view map according to the image entropy of the view map, the more dense the picture content, the fuller the color and the clearer the outline of the view map are, the higher the image entropy is, the larger the image entropy is, the richer the information content contained in the view map can be shown, the reliability of the similarity of the characteristic representation is higher, the greater the confidence is given to the characteristic representation, and the smaller the image entropy is, the smaller the confidence can be given to the characteristic representation.
And step 140, determining the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model library according to the convolution characteristics of the multiple visual angle images and the corresponding confidence degrees.
The electronic device may assign a weight of the convolution feature of the corresponding perspective view according to the confidence of each perspective view, where the higher the confidence is, the larger the assigned weight may be, and the lower the confidence is, the smaller the assigned weight may be. As an implementation manner, when the non-rigid three-dimensional model is compared with each solid model in the three-dimensional model library, the convolution feature of the view angle diagram of each view angle of the non-rigid three-dimensional model and the convolution feature of the view angle diagram of the view angle corresponding to the solid model are compared, and the similarity between the non-rigid three-dimensional model and the solid model is calculated one by one according to the comparison result and the corresponding weight. The non-rigid three-dimensional models can calculate the similarity with each solid model in the three-dimensional model library one by one, and the solid model corresponding to the non-rigid three-dimensional model is retrieved according to the similarity.
And 150, searching the entity model matched with the non-rigid three-dimensional model according to the similarity.
In some embodiments, after obtaining the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model library, the electronic device may output the corresponding entity models in order of magnitude of the similarity, and as a specific embodiment, may output the identity information of the corresponding entity models from large to small according to the similarity, such as outputting the number of the corresponding entity model from large to small, or outputting the identity information of the corresponding entity model from small to large. It is to be understood that the identity information may be various and may be used only for identifying the mockup, and is not limited to the above-mentioned numbers.
In the embodiment, a plurality of view images corresponding to different views in the non-rigid three-dimensional model are obtained, the convolution characteristic of each view image is extracted through the convolution neural network, the image entropy of each view image is obtained, the confidence corresponding to the convolution characteristic of each view image is determined according to the image entropy, and the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model library is determined according to the convolution characteristics of the plurality of view images and the corresponding confidence, so that the entity model matched with the non-rigid three-dimensional model can be retrieved according to the similarity, and the three-dimensional model can be retrieved more accurately and quickly.
As shown in fig. 3, in an embodiment, step 110, acquiring a plurality of perspective views of the non-rigid three-dimensional model, each perspective view corresponding to a different perspective, includes the following steps:
in the virtual space, the non-rigid three-dimensional model is enclosed in a cube, step 112.
The virtual space may refer to a virtual three-dimensional space for constructing a non-rigid three-dimensional model, the electronic device obtains multiple view angle maps of the non-rigid three-dimensional model, the non-rigid three-dimensional model may be enclosed in a cube in the virtual space, the center of the cube may overlap with the center of the non-rigid three-dimensional model, and the non-rigid three-dimensional model may be placed exactly in the middle of the cube.
And step 114, respectively arranging corresponding virtual cameras at the centers of six faces and eight vertexes of the cube, wherein each virtual camera points to the center of the non-rigid three-dimensional model.
The cube comprises six faces and eight vertexes, fourteen virtual cameras can be respectively arranged at the centers of the six faces and the eight vertexes, and all the arranged virtual cameras can point to the center of the non-rigid three-dimensional model, namely all the virtual cameras can point to the center of the cube.
And step 116, rendering the view angle graph of the corresponding view angle through each virtual camera respectively.
The virtual cameras arranged at different positions can capture view angle diagrams of the non-rigid three-dimensional model under different view angles, and the electronic equipment can render the view angle diagrams corresponding to the view angles through the virtual cameras, so that the view angle diagrams corresponding to the different view angles are obtained.
FIG. 4 is a schematic diagram illustrating the acquisition of a perspective view of a non-rigid three-dimensional model in one embodiment. As shown in fig. 4, a non-rigid three-dimensional model is enclosed in a cube 400 in a virtual space, and fourteen virtual cameras are respectively disposed at the centers of six faces and eight vertices of the cube 400 for rendering view angles of the non-rigid three-dimensional model at different view angles, for example, a view angle rendered at vertex a is a view angle 410, and a view angle rendered at vertex B is a view angle 420.
In the embodiment, the dimension of the non-rigid three-dimensional model is reduced, and the non-rigid three-dimensional model is converted into the two-dimensional view image which is convenient to process, so that the processing objects become regular and uniform, and the possibility of fully transplanting the convolutional neural network with excellent performance in the field of two-dimensional image processing is provided.
As shown in fig. 5, in an embodiment, the step 130 of obtaining the image entropy of each view, and determining the confidence corresponding to the convolution feature of each view according to the image entropy includes the following steps:
and 142, acquiring the scale of each view angle image and the pixel gray value of each pixel point.
The entropy of an image represents the average number of bits of a set of image gray levels, describing the average amount of information of the image. As an embodiment, the more dense the picture content, the fuller the color, and the clearer the information content of the view angle map, the larger the image entropy. The electronic equipment qualitatively judges the reliability of the convolution characteristics of each view image by calculating the image entropy of each view image, so that confidence is given.
In some embodiments, the electronic device may obtain a scale of each view map, where the scale is used to represent a size of the view map, and obtain a pixel gray value of each pixel point included in the view map.
And determining the gray distribution characteristics of each view angle image according to the scale of each view angle image and the pixel gray value of each pixel point.
In one embodiment, determining the gray distribution characteristic of the view angle map can be as shown in equation (3):
wherein i represents the gray value of the pixel, j represents the mean value of the gray levels of the neighborhood, f (i, j) is the frequency of the appearance of the characteristic binary group (, j), N is the scale of the view angle diagram, P is the frequency of the appearance of the characteristic binary groupijRepresenting a gray scale distribution characteristic.
And step 146, calculating the image entropy of each view according to the scale and the gray distribution characteristics of each view.
The electronic equipment can calculate the image entropy of each view map according to the scale and the gray distribution characteristics of each view map, and the image entropy can be used for highlighting the comprehensive characteristics of the gray distribution in the view map.
In one embodiment, the formula for calculating the entropy of the image can be shown as equation (4):
where H represents the image entropy and i represents the pixel gray value.
And step 148, determining a confidence coefficient corresponding to the convolution characteristic of each view image according to the image entropy, wherein the image entropy and the confidence coefficient are in positive correlation.
The larger the image entropy is, the richer the information content contained in the view map is, the more reliable the similarity of the feature characterization is, the greater the confidence is given to the view map, and the smaller the image entropy is, the smaller the confidence can be given to the view map.
In one embodiment, the step 140 of determining the similarity between the non-rigid three-dimensional model and each solid model in the three-dimensional model library according to the convolution features and the corresponding confidence degrees of the multiple perspective views comprises the following steps:
(a) and respectively calculating the Euclidean distance between each view angle image and the view angle image of the view angle corresponding to the entity model according to the convolution characteristics of the plurality of view angle images.
The electronic device may compare the non-rigid three-dimensional model with each entity model of the three-dimensional model library, may compare the convolution characteristics of the perspective view map of each perspective of the non-rigid three-dimensional model with the convolution characteristics of the perspective view map of the perspective corresponding to the entity model, and calculate the euclidean distance between each perspective view map and the perspective view map of the perspective corresponding to the entity model, respectively, where the euclidean distance may be used to represent the similarity between the perspective view map of the non-rigid three-dimensional model and the perspective view map of the perspective corresponding to the entity model.
As a specific embodiment, the calculation formula of the euclidean distance may be as shown in formula (6):
wherein d represents Euclidean distance, j is the convolution characteristic quantity of the view diagram, x is the convolution characteristic of the view diagram of the non-rigid three-dimensional model, and y is the convolution characteristic of the view diagram corresponding to the solid model.
(b) And determining the similarity between the non-rigid three-dimensional model and the solid model according to the Euclidean distance of each view angle diagram and the corresponding confidence coefficient.
The electronic equipment can qualitatively judge the reliability of the Euclidean distance by calculating the image entropy of each view image, so that confidence is given, and the similarity between the non-rigid three-dimensional model and the entity model can be determined according to the Euclidean distance of each view image and the corresponding confidence.
In one embodiment, the similarity measure between the non-rigid three-dimensional model and the solid model can be represented by the following formula (7):
wherein, S represents similarity, n is the number of the acquired view angle diagrams, d is the calculated euclidean distance of the view angle diagrams, and w is the confidence corresponding to the view angle diagrams.
In some embodiments, after obtaining the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model library, the electronic device may output the similarity between the non-rigid three-dimensional model and each entity model in a predetermined order, for example, may output the corresponding entity models in the order of the magnitude of the similarity, for the user to select, and use the selected entity model as the entity model matched with the non-rigid three-dimensional model. In some embodiments, the real model with the largest similarity can also be used as the model matched with the non-rigid three-dimensional model.
In this embodiment, the euclidean distance of the perspective view corresponding to the solid model is calculated according to the convolution characteristic of each perspective view of the non-rigid three-dimensional model, so that the similarity is determined according to the euclidean distance and the corresponding confidence level, and the three-dimensional model can be retrieved more accurately and quickly.
In one embodiment, a convolutional neural network and multi-view based three-dimensional model retrieval method is provided, and comprises the following steps:
and (1) acquiring a plurality of view angle images of the non-rigid three-dimensional model, wherein each view angle image corresponds to a different view angle.
In one embodiment, before step (1), further comprising: acquiring an image data set, wherein the image data set comprises a plurality of images; performing a pre-processing operation on an image in the image dataset, the pre-processing operation including at least one of translation, rotation, scaling, and symmetric mapping; and training according to the preprocessed image data set to obtain the convolutional neural network.
In one embodiment, the convolutional neural network comprises 13 convolutional layers and 3 fully-connected layers, wherein a plurality of maximum pooling layers are arranged in the convolutional layers, a classification layer is arranged behind the fully-connected layers, each convolutional layer adopts a convolutional kernel with the size of 3 × 3, and the classification layer is used for mapping multidimensional features extracted by the convolutional layers into n-dimensional probability distribution, wherein n is a positive integer greater than 1; after training to obtain the convolutional neural network according to the preprocessed image data set, the method further comprises the following steps: the full connection layer and the classification layer are deleted.
In one embodiment, step (1) comprises: enclosing a non-rigid three-dimensional model in a cube in a virtual space; respectively arranging corresponding virtual cameras at the centers of six faces and eight vertexes of the cube, wherein each virtual camera points to the center of the non-rigid three-dimensional model; rendering a view angle diagram of the corresponding view angle through each virtual camera respectively.
And (2) extracting the convolution characteristics of each view map through a convolution neural network.
And (3) acquiring the image entropy of each view image, and determining the confidence corresponding to the convolution feature of each view image according to the image entropy.
In one embodiment, step (3) comprises: acquiring the scale of each visual angle image and the pixel gray value of each pixel point contained in the visual angle image; determining the gray distribution characteristics of the corresponding view angle image according to the pixel gray values of all the pixel points; calculating the image entropy of each view image according to the scale and the gray distribution characteristics of each view image; and determining the confidence corresponding to the convolution characteristics of each view image according to the image entropy, wherein the image entropy and the confidence are in positive correlation.
And (4) determining the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model library according to the convolution characteristics of the multiple visual angle images and the corresponding confidence degrees.
In one embodiment, step (4) comprises: respectively calculating Euclidean distances between each view angle image and the view angle image of the view angle corresponding to the entity model according to the convolution characteristics of the plurality of view angle images; and determining the similarity between the non-rigid three-dimensional model and the solid model according to the Euclidean distance of each view angle diagram and the corresponding confidence coefficient.
And (5) searching the entity model matched with the non-rigid three-dimensional model according to the similarity.
In one embodiment, step (5) comprises: and outputting the similarity between the non-rigid three-dimensional model and each solid model according to a preset sequence, and taking the selected solid model as a solid model matched with the non-rigid three-dimensional model.
In one embodiment, the Euclidean distance calculation formula is as follows:
wherein d represents a Euclidean distance, j is the convolution characteristic quantity of the view diagram, x is the convolution characteristic of the view diagram of the non-rigid three-dimensional model, and y is the convolution characteristic of the view diagram corresponding to the solid model;
the measurement formula of the similarity between the non-rigid three-dimensional model and the solid model is as follows:
wherein, S represents similarity, n is the number of the acquired view angle diagrams, d is the calculated euclidean distance of the view angle diagrams, and w is the confidence corresponding to the view angle diagrams.
In the embodiment, a plurality of view images corresponding to different views in the non-rigid three-dimensional model are obtained, the convolution characteristic of each view image is extracted through the convolution neural network, the image entropy of each view image is obtained, the confidence corresponding to the convolution characteristic of each view image is determined according to the image entropy, and the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model library is determined according to the convolution characteristics of the plurality of view images and the corresponding confidence, so that the entity model matched with the non-rigid three-dimensional model can be retrieved according to the similarity, and the three-dimensional model can be retrieved more accurately and quickly.
As shown in fig. 6, in an embodiment, a three-dimensional model retrieving apparatus 600 based on a convolutional neural network and a multi-view map is provided, which includes an obtaining module 610, an extracting module 620, a confidence determining module 630, a similarity determining module 640, and a retrieving module 650.
The obtaining module 610 is configured to obtain multiple view angle maps of the non-rigid three-dimensional model, where each view angle map corresponds to a different view angle.
And an extracting module 620, configured to extract the convolution feature of each view map through a convolution neural network.
The confidence determining module 630 is configured to obtain an image entropy of each view, and determine a confidence corresponding to the convolution feature of each view according to the image entropy.
And the similarity determining module 640 is configured to determine the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model library according to the convolution features of the multiple view maps and the corresponding confidence degrees.
And the retrieval module 650 is used for retrieving the entity model matched with the non-rigid three-dimensional model according to the similarity.
In one embodiment, the apparatus 600 for retrieving a three-dimensional model based on a convolutional neural network and a multi-view map includes a training module in addition to the acquiring module 610, the extracting module 620, the confidence determining module 630, the similarity determining module 640, and the retrieving module 650.
The training module is used for acquiring an image data set, and the image data set comprises a plurality of images; performing a pre-processing operation on an image in the image dataset, the pre-processing operation including at least one of translation, rotation, scaling, and symmetric mapping; and training according to the preprocessed image data set to obtain the convolutional neural network.
In one embodiment, the convolutional neural network includes 13 convolutional layers and 3 fully-connected layers, a plurality of largest pooling layers are disposed in the convolutional layers, a classification layer is disposed behind the fully-connected layers, each convolutional layer adopts a convolutional kernel with a size of 3 × 3, and the classification layer is used for mapping multidimensional features extracted by the convolutional layers into n-dimensional probability distribution, wherein n is a positive integer greater than 1.
In one embodiment, the training module is further configured to delete the fully-connected layer and the classified layer.
In the embodiment, a plurality of view images corresponding to different views in the non-rigid three-dimensional model are obtained, the convolution characteristic of each view image is extracted through the convolution neural network, the image entropy of each view image is obtained, the confidence corresponding to the convolution characteristic of each view image is determined according to the image entropy, and the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model library is determined according to the convolution characteristics of the plurality of view images and the corresponding confidence, so that the entity model matched with the non-rigid three-dimensional model can be retrieved according to the similarity, and the three-dimensional model can be retrieved more accurately and quickly.
In one embodiment, the obtaining module 610 includes a surrounding unit, a setting unit, and a rendering unit.
And the surrounding unit is used for surrounding the non-rigid three-dimensional model in a cube in the virtual space.
And the setting unit is used for respectively setting corresponding virtual cameras at the centers of six faces and eight vertexes of the cube, and each virtual camera points to the center of the non-rigid three-dimensional model.
And the rendering unit is used for rendering the view angle graph of the corresponding view angle through each virtual camera.
In the embodiment, the dimension of the non-rigid three-dimensional model is reduced, and the non-rigid three-dimensional model is converted into the two-dimensional view image which is convenient to process, so that the processing objects become regular and uniform, and the possibility of fully transplanting the convolutional neural network with excellent performance in the field of two-dimensional image processing is provided.
In one embodiment, the confidence determining module 630 includes a gray level obtaining unit, a distribution feature determining unit, an image entropy calculating unit, and a confidence determining unit.
And the gray scale acquisition unit is used for acquiring the scale of each visual angle image and the pixel gray scale value of each pixel point.
And the distribution characteristic determining unit is used for determining the gray distribution characteristic of the corresponding view angle image according to the pixel gray value of each pixel point.
And the image entropy calculating unit is used for calculating the image entropy of each view map according to the scale and the gray distribution characteristics of each view map.
And the confidence coefficient determining unit is used for determining the confidence coefficient corresponding to the convolution characteristic of each view image according to the image entropy, and the image entropy and the confidence coefficient are in positive correlation.
In one embodiment, the similarity determination module 640 includes a euclidean distance calculation unit and a similarity determination unit.
And the Euclidean distance calculating unit is used for respectively calculating the Euclidean distance between each view angle image and the view angle image of the view angle corresponding to the entity model according to the convolution characteristics of the plurality of view angle images.
And the similarity determining unit is used for determining the similarity between the non-rigid three-dimensional model and the entity model according to the Euclidean distance of each view angle diagram and the corresponding confidence coefficient.
In one embodiment, the retrieving module 650 is further configured to output the similarity between the non-rigid three-dimensional model and each solid model in a predetermined order, and use the selected solid model as a solid model matching the non-rigid three-dimensional model.
In one embodiment, the Euclidean distance calculation formula is as follows:
wherein d represents Euclidean distance, j is the convolution characteristic quantity of the view diagram, x is the convolution characteristic of the view diagram of the non-rigid three-dimensional model, and y is the convolution characteristic of the view diagram corresponding to the solid model.
In one embodiment, the metric of similarity between the non-rigid three-dimensional model and the solid model is formulated as:
wherein, S represents similarity, n is the number of the acquired view angle diagrams, d is the calculated euclidean distance of the view angle diagrams, and w is the confidence corresponding to the view angle diagrams.
In this embodiment, the euclidean distance of the perspective view corresponding to the solid model is calculated according to the convolution characteristic of each perspective view of the non-rigid three-dimensional model, so that the similarity is determined according to the euclidean distance and the corresponding confidence level, and the three-dimensional model can be retrieved more accurately and quickly.
Fig. 7 is a block diagram of an electronic device in one embodiment. As shown in fig. 7, in an embodiment, the electronic device 10 may be a server, or may be a terminal device such as a desktop computer or a notebook computer. The electronic device 10 may include one or more of the following components: a processor 11 and a memory 13, wherein one or more application programs may be stored in the memory 13 and configured to be executed by the one or more processors 11, the one or more programs configured to perform the convolutional neural network and multi-view based three-dimensional model retrieval method as described in the above embodiments.
The Memory 13 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 13 may be used to store instructions, programs, code sets or instruction sets. The memory 13 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described method embodiments, and the like. The stored data area may also store data created during use by the electronic device 10, and the like.
It is understood that the electronic device 10 may include more or less structural elements than those shown in the above structural block diagrams, and is not limited thereto.
In one embodiment, a computer-readable storage medium is further provided, on which a computer program is stored, which when executed by a processor implements the method for retrieving a three-dimensional model based on a convolutional neural network and a multi-view map as described in the above embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), or the like.
Any reference to memory, storage, database, or other medium as used herein may include non-volatile and/or volatile memory. Suitable non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (8)
1. A three-dimensional model retrieval method based on a convolutional neural network and a multi-view map is characterized by comprising the following steps:
acquiring a plurality of view angle images of the non-rigid three-dimensional model, wherein each view angle image corresponds to a different view angle;
extracting convolution characteristics of each view map through a convolution neural network;
acquiring the size of each visual angle image and the pixel gray value of each pixel point contained in the visual angle image;
determining the gray distribution characteristics of the corresponding view angle image according to the pixel gray values of the pixel points;
calculating the image entropy of each view image according to the size and the gray distribution characteristics of each view image;
determining a confidence coefficient corresponding to the convolution feature of each view image according to the image entropy, wherein the image entropy and the confidence coefficient are in positive correlation;
determining the similarity between the non-rigid three-dimensional model and each entity model in a three-dimensional model library according to the convolution characteristics of the multiple visual angle images and the corresponding confidence degrees;
retrieving a solid model matched with the non-rigid three-dimensional model according to the similarity;
prior to the acquiring the multiple perspective views of the non-rigid three-dimensional model, the method further comprises:
acquiring an image data set, wherein the image data set comprises a plurality of images;
performing a pre-processing operation on an image in the image dataset, the pre-processing operation including at least one of translation, rotation, scaling, and symmetric mapping;
and training according to the preprocessed image data set to obtain the convolutional neural network.
2. The method of claim 1, wherein the convolutional neural network comprises 13 convolutional layers in which a plurality of max pooling layers are disposed and 3 fully-connected layers after which a classification layer is disposed, each convolutional layer employing a convolutional kernel having a size of 3 x 3, the classification layer being for mapping multidimensional features extracted by the convolutional layers to a probability distribution of n-dimensions, where n is a positive integer greater than 1;
after training from the pre-processed image data set to obtain a convolutional neural network, the method further comprises: and deleting the full connection layer and the classification layer.
3. The method of claim 1, wherein the acquiring multiple perspective views of the non-rigid three-dimensional model comprises:
enclosing a non-rigid three-dimensional model in a cube in a virtual space;
respectively arranging corresponding virtual cameras at the centers of six faces and eight vertexes of the cube, wherein each virtual camera points to the center of the non-rigid three-dimensional model;
rendering a view angle diagram of the corresponding view angle through each virtual camera respectively.
4. The method of claim 1, wherein determining the similarity between the non-rigid three-dimensional model and each solid model in the three-dimensional model library according to the convolution features and the corresponding confidence degrees of the plurality of perspective views comprises:
respectively calculating the Euclidean distance between each view angle image and the view angle image of the view angle corresponding to the entity model according to the convolution characteristics of the plurality of view angle images;
determining the similarity between the non-rigid three-dimensional model and the entity model according to the Euclidean distance of each view angle diagram and the corresponding confidence coefficient;
the retrieving of the solid model matching the non-rigid three-dimensional model according to the similarity comprises:
and outputting the similarity of the non-rigid three-dimensional model and each solid model according to a preset sequence, and taking the selected solid model as a solid model matched with the non-rigid three-dimensional model.
5. The method of claim 4, wherein the Euclidean distance is calculated as follows:
wherein d isi(x, y) represents the Euclidean distance of the convolution feature of the ith view angle diagram of the non-rigid three-dimensional model and the solid model, x is the convolution feature of the ith view angle diagram of the non-rigid three-dimensional model, y is the convolution feature of the ith view angle diagram corresponding to the solid model, j is the dimension of the convolution feature x and y, and x and y are dimensions of the convolution feature x and ykFor the k-dimension convolution characteristic, y, of the ith view angle map of the non-rigid three-dimensional modelkThe k-dimension convolution characteristic of the ith view angle image corresponding to the solid model is obtained; the measurement formula of the similarity between the non-rigid three-dimensional model and the solid model is as follows:
wherein S represents similarity, n is the number of acquired view angle maps, diAnd in order to calculate the Euclidean distance of the convolution features of the ith view angle diagram of the non-rigid three-dimensional model and the solid model, w is the confidence corresponding to the view angle diagram, and i represents the ith view angle diagram.
6. A three-dimensional model retrieval device based on a convolutional neural network and a multi-view map is characterized by comprising:
the acquisition module is used for acquiring a plurality of view angle images of the non-rigid three-dimensional model, and each view angle image corresponds to different view angles;
the extraction module is used for extracting the convolution characteristics of each view map through a convolution neural network;
the confidence coefficient determining module is used for acquiring the size of each visual angle image and the pixel gray value of each contained pixel point; determining the gray distribution characteristics of the corresponding view angle image according to the pixel gray values of the pixel points; calculating the image entropy of each view image according to the size and the gray distribution characteristics of each view image; determining a confidence coefficient corresponding to the convolution feature of each view image according to the image entropy, wherein the image entropy and the confidence coefficient are in positive correlation;
the similarity determining module is used for determining the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model library according to the convolution characteristics of the multiple visual angle images and the corresponding confidence coefficients;
the retrieval module is used for retrieving the entity model matched with the non-rigid three-dimensional model according to the similarity;
the training module is used for acquiring an image data set, and the image data set comprises a plurality of images; performing a pre-processing operation on an image in the image dataset, the pre-processing operation including at least one of translation, rotation, scaling, and symmetric mapping; and training according to the preprocessed image data set to obtain the convolutional neural network.
7. An electronic device comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, causes the processor to carry out the method of any of claims 1 to 5.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910329456.5A CN110147460B (en) | 2019-04-23 | 2019-04-23 | Three-dimensional model retrieval method and device based on convolutional neural network and multi-view map |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910329456.5A CN110147460B (en) | 2019-04-23 | 2019-04-23 | Three-dimensional model retrieval method and device based on convolutional neural network and multi-view map |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110147460A CN110147460A (en) | 2019-08-20 |
CN110147460B true CN110147460B (en) | 2021-08-06 |
Family
ID=67593885
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910329456.5A Expired - Fee Related CN110147460B (en) | 2019-04-23 | 2019-04-23 | Three-dimensional model retrieval method and device based on convolutional neural network and multi-view map |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110147460B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110543581B (en) * | 2019-09-09 | 2023-04-04 | 山东省计算中心(国家超级计算济南中心) | Multi-view three-dimensional model retrieval method based on non-local graph convolution network |
CN112434177B (en) * | 2020-11-27 | 2023-06-20 | 北京邮电大学 | Three-dimensional model retrieval method and device, electronic equipment and storage medium |
CN116643648B (en) * | 2023-04-13 | 2023-12-19 | 中国兵器装备集团自动化研究所有限公司 | Three-dimensional scene matching interaction method, device, equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130039569A1 (en) * | 2010-04-28 | 2013-02-14 | Olympus Corporation | Method and apparatus of compiling image database for three-dimensional object recognition |
CN108009222A (en) * | 2017-11-23 | 2018-05-08 | 浙江工业大学 | Method for searching three-dimension model based on more excellent view and depth convolutional neural networks |
CN108829701A (en) * | 2018-04-25 | 2018-11-16 | 鹰霆(天津)科技有限公司 | A kind of 3D model retrieval method based on sketch |
-
2019
- 2019-04-23 CN CN201910329456.5A patent/CN110147460B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130039569A1 (en) * | 2010-04-28 | 2013-02-14 | Olympus Corporation | Method and apparatus of compiling image database for three-dimensional object recognition |
CN108009222A (en) * | 2017-11-23 | 2018-05-08 | 浙江工业大学 | Method for searching three-dimension model based on more excellent view and depth convolutional neural networks |
CN108829701A (en) * | 2018-04-25 | 2018-11-16 | 鹰霆(天津)科技有限公司 | A kind of 3D model retrieval method based on sketch |
Non-Patent Citations (5)
Title |
---|
基于Hpal信息熵融合的三维模型检索方法;陈俊英,何波,王羡慧;《系统仿真学报》;20120930;第24卷(第9期);全文 * |
基于卷积神经网络和投票机制的三维模型分类与检索;白静,司庆龙,秦飞巍;《计算机辅助设计与图形学学报》;20190228;第31卷(第2期);正文第1-3部分 * |
基于多视图的三维模型检索系统;许磊;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190415;全文 * |
基于特征融合及流行排序的三维模型检索研究;陈强;《中国博士学位论文全文数据库 信息科技辑》;20170315;全文 * |
融合信息熵和CNN的基于手绘的三维模型检索;刘玉杰,宋阳,等;《图学学报》;20180831;第39卷(第4期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110147460A (en) | 2019-08-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111062871B (en) | Image processing method and device, computer equipment and readable storage medium | |
CN109960742B (en) | Local information searching method and device | |
US11704357B2 (en) | Shape-based graphics search | |
CN110659582A (en) | Image conversion model training method, heterogeneous face recognition method, device and equipment | |
WO2022111069A1 (en) | Image processing method and apparatus, electronic device and storage medium | |
CN110147460B (en) | Three-dimensional model retrieval method and device based on convolutional neural network and multi-view map | |
CN106096542B (en) | Image video scene recognition method based on distance prediction information | |
CN110781911B (en) | Image matching method, device, equipment and storage medium | |
CN107291825A (en) | With the search method and system of money commodity in a kind of video | |
CN112085835B (en) | Three-dimensional cartoon face generation method and device, electronic equipment and storage medium | |
US20210117648A1 (en) | 3-dimensional model identification | |
CN113011253B (en) | Facial expression recognition method, device, equipment and storage medium based on ResNeXt network | |
CN111652054A (en) | Joint point detection method, posture recognition method and device | |
CN115630236A (en) | Global fast retrieval positioning method of passive remote sensing image, storage medium and equipment | |
CN112907569A (en) | Head image area segmentation method and device, electronic equipment and storage medium | |
CN112668608A (en) | Image identification method and device, electronic equipment and storage medium | |
CN114298997B (en) | Fake picture detection method, fake picture detection device and storage medium | |
CN111354076A (en) | Single-image three-dimensional part combined modeling method based on embedding space | |
Meng et al. | Merged region based image retrieval | |
CN111414802B (en) | Protein data characteristic extraction method | |
CN114677578A (en) | Method and device for determining training sample data | |
CN114266693A (en) | Image processing method, model generation method and equipment | |
CN113849679A (en) | Image retrieval method, image retrieval device, electronic equipment and storage medium | |
CN114519729A (en) | Image registration quality evaluation model training method and device and computer equipment | |
H'roura et al. | 3D objects descriptors methods: overview and trends |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210806 |