CN110147460B

CN110147460B - Three-dimensional model retrieval method and device based on convolutional neural network and multi-view map

Info

Publication number: CN110147460B
Application number: CN201910329456.5A
Authority: CN
Inventors: 胡书山; 朱天放
Original assignee: Hubei University
Current assignee: Hubei University
Priority date: 2019-04-23
Filing date: 2019-04-23
Publication date: 2021-08-06
Anticipated expiration: 2039-04-23
Also published as: CN110147460A

Abstract

The embodiment of the application relates to a three-dimensional model retrieval method and device based on a convolutional neural network and a multi-view diagram, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a plurality of view angle images of the non-rigid three-dimensional model, wherein each view angle image corresponds to a different view angle; extracting convolution characteristics of each view map through a convolution neural network; acquiring the image entropy of each view image, and determining the confidence corresponding to the convolution feature of each view image according to the image entropy; determining the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model according to the convolution characteristics of the multiple view angle images and the corresponding confidence degrees; and retrieving the entity model matched with the non-rigid three-dimensional model according to the similarity. The three-dimensional model retrieval method, the three-dimensional model retrieval device, the electronic equipment and the storage medium based on the convolutional neural network and the multi-view map can more accurately retrieve the three-dimensional model.

Description

Three-dimensional model retrieval method and device based on convolutional neural network and multi-view map

Technical Field

The present disclosure relates to the field of three-dimensional design, and in particular, to a method and an apparatus for retrieving a three-dimensional model based on a convolutional neural network and a multi-view map, an electronic device, and a storage medium.

Background

Compared with various media data such as images, videos, sounds and the like, the three-dimensional model and the three-dimensional scene formed by the three-dimensional model can show objective object information more comprehensively and truly, and accord with the perception form of a human visual system, so that the three-dimensional model and the three-dimensional scene are widely applied to the fields of product design, virtual reality, movie animation and the like. The similar models are quickly and accurately retrieved from the model library, so that the task amount of constructing repeated models can be reduced, the management and the use of three-dimensional model materials can be facilitated, and the three-dimensional design is easy and quick. In the current model base construction, a text-based retrieval mode is mostly adopted for three-dimensional model retrieval, and the mode needs manual model labeling. However, by means of artificial subjective judgment, models in the same label may vary greatly, and the retrieval result is not accurate.

Disclosure of Invention

The embodiment of the application provides a three-dimensional model retrieval method and device based on a convolutional neural network and a multi-view diagram, electronic equipment and a storage medium, and the three-dimensional model can be retrieved more accurately.

A three-dimensional model retrieval method based on a convolutional neural network and a multi-view map comprises the following steps:

acquiring a plurality of view angle images of the non-rigid three-dimensional model, wherein each view angle image corresponds to a different view angle;

extracting convolution characteristics of each view map through a convolution neural network;

acquiring the image entropy of each view image, and determining the confidence corresponding to the convolution feature of each view image according to the image entropy;

determining the similarity between the non-rigid three-dimensional model and each entity model in a three-dimensional model library according to the convolution characteristics of the multiple visual angle images and the corresponding confidence degrees;

and retrieving the entity model matched with the non-rigid three-dimensional model according to the similarity.

A three-dimensional model retrieval device based on a convolutional neural network and a multi-view map comprises:

the acquisition module is used for acquiring a plurality of view angle images of the non-rigid three-dimensional model, and each view angle image corresponds to different view angles;

the extraction module is used for extracting the convolution characteristics of each view map through a convolution neural network;

the confidence coefficient determining module is used for acquiring the image entropy of each view image and determining the confidence coefficient corresponding to the convolution feature of each view image according to the image entropy;

the similarity determining module is used for determining the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model library according to the convolution characteristics of the multiple visual angle images and the corresponding confidence coefficients;

and the retrieval module is used for retrieving the entity model matched with the non-rigid three-dimensional model according to the similarity.

An electronic device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to carry out the method as described above.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method as set forth above.

The three-dimensional model retrieval method, the device, the electronic equipment and the storage medium based on the convolutional neural network and the multi-view images are used for acquiring the multiple view images corresponding to different views in the non-rigid three-dimensional model, extracting the convolution characteristics of each view image through the convolutional neural network, acquiring the image entropy of each view image, determining the confidence corresponding to the convolution characteristics of each view image according to the image entropy, and determining the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model library according to the convolution characteristics of the multiple view images and the corresponding confidence, so that the entity model matched with the non-rigid three-dimensional model can be retrieved according to the similarity, and the three-dimensional model can be retrieved more accurately and quickly.

Drawings

FIG. 1 is a flowchart illustrating the steps of a method for retrieving a three-dimensional model based on a convolutional neural network and a multi-view map, according to an embodiment;

FIG. 2 is a schematic diagram of the components of a convolutional neural network in one embodiment;

FIG. 3 is a flowchart illustrating steps taken to obtain multiple perspective views in one embodiment;

FIG. 4 is a schematic diagram illustrating the acquisition of a perspective view of a non-rigid three-dimensional model in one embodiment;

FIG. 5 is a flow diagram of the steps for obtaining image entropy and determining corresponding confidence levels in one embodiment;

FIG. 6 is a block diagram of a three-dimensional model retrieval device based on a convolutional neural network and a multi-view map in one embodiment;

fig. 7 is a block diagram of an electronic device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another. For example, a first client may be referred to as a second client, and similarly, a second client may be referred to as a first client, without departing from the scope of the present application. Both the first client and the second client are clients, but they are not the same client.

As shown in fig. 1, an embodiment of the present application provides a three-dimensional model retrieval method based on a convolutional neural network and a multi-view map, including the following steps:

and 110, acquiring a plurality of view angle images of the non-rigid three-dimensional model, wherein each view angle image corresponds to a different view angle.

The electronic device can acquire multiple view angle diagrams of a non-rigid three-dimensional model, wherein the non-rigid three-dimensional model can refer to a three-dimensional model needing to be searched, and the three-dimensional model can be deformed under various processing actions. The view angle diagram refers to an image formed by rendering the non-rigid three-dimensional model from a specific angle, the acquired view angle diagrams can respectively correspond to different view angles, namely the view angle diagrams of the non-rigid three-dimensional model can be generated by rendering from different view angles, and the acquired view angle diagrams can contain most of characteristic information such as shape, color and the like of the non-rigid three-dimensional model. Compared with two-dimensional image representation modes of other three-dimensional models such as projection and hand-drawn sketches, the view angle diagram is closer to the real existence of the model.

And step 120, extracting the convolution characteristics of each view through a convolution neural network.

The electronic equipment reduces the dimension of the non-rigid three-dimensional model and converts the non-rigid three-dimensional model into a two-dimensional view image which is convenient to process, so that a processing object becomes regular and uniform, and a convolutional neural network with excellent performance in the field of two-dimensional image processing is provided with possibility for full transplantation. The acquired multiple view angle images can be used as a group of two-dimensional description of the non-rigid three-dimensional model, the electronic equipment can input the acquired multiple view angle images into a pre-established convolutional neural network, and the convolutional features of each view angle image are extracted through the convolutional neural network. In some embodiments, the convolutional neural Network adopts VGG16(Visual Geometry Group Network 16) basic architecture, and a 16-layer deep convolutional neural Network is constructed by repeatedly stacking small convolutional kernels, and the convolutional neural Network comprises 13 convolutional layers and 3 fully-connected layers, wherein the convolutional layers can adopt convolutional kernels with the size of 3 × 3, so that the number of parameters of the convolutional neural Network can be reduced, and the complexity of calculation can be reduced.

In some embodiments, the convolutional neural network may be pre-established from an image dataset, and model parameters of the convolutional neural network may be trained from the image dataset, which may contain a large number of two-dimensional images, and may be a large-scale image dataset such as ImageNet. The electronic device can perform preprocessing operation on the image in the image data set, wherein the preprocessing operation can include performing at least one of translation, rotation, scaling, symmetric mapping and the like on the image in the image data set to obtain a preprocessed image, then training the preprocessed image according to the image data set containing the preprocessed image to obtain a convolutional neural network, so that the convolutional neural network can acquire certain translation invariance, rotation invariance, scale transformation invariance and the like, convolution characteristics extracted by the convolutional neural network have transformation invariance, even if a non-rigid three-dimensional model is subjected to translation, rotation and the like, the retrieval effect can still be kept from being reduced, and the retrieval result can be more accurate and stable.

In some embodiments, a plurality of maximum pooling layers may be disposed in the 13 convolutional layers, and for the feature map output from the previous convolutional layer, the maximum pooling layer may select a maximum value of the feature points in the neighborhood, and if values of the remaining feature points in the neighborhood are slightly changed, the result is unchanged after passing through the maximum pooling layer, so that the estimated mean shift caused by parameter errors of the convolutional layers may be reduced. The convolutional neural network can be provided with an activation function, and the activation function can be used for judging whether the characteristic intensity of the corresponding region reaches a certain standard. As an embodiment, the convolutional neural network may employ a ReLU activation function, and the formula of the activation function may be as shown in equation (1):

f(x)＝max(0,x) (1)

when the feature intensity of the region is lower than the standard, 0 can be output, which indicates that the feature of the region cannot be extracted, and when the feature intensity reaches the standard, the corresponding feature can be output, and by using the activation function, the region which is not related to the feature does not influence the training of the convolutional neural network.

In some embodiments, a classification layer may be provided after the fully connected layer, and the classification layer may be used to map the multi-dimensional features of the image into an n-dimensional probability distribution.

As a specific embodiment, the formula of the classification layer Softmax layer can be shown as formula (2):

and calculating the probability of the multidimensional characteristics Z in each preset category through a full connection layer and a classification layer, thereby training the convolutional neural network through a gradient descent method.

FIG. 2 is a schematic diagram of the components of a convolutional neural network in one embodiment. As shown in fig. 2, the preprocessed image may be input into the convolutional neural network in a size of 224 × 224, 512 features may be output through 13 convolutional layers and 5 maximum pooling layers arranged in the convolutional layers, 512 features are input into 3 fully-connected layers, and the 512 features are mapped into n-dimensional probability distribution through the last classification layer (i.e., Softmax layer) to determine the class corresponding to the image, so as to train the convolutional neural network.

In one embodiment, after training the convolutional neural network according to the preprocessed image data set, the fully-connected layers and the classification layers of the convolutional neural network may be deleted, only the convolutional layers and the maximum pooling layers are reserved, when the convolutional features of each view of the non-rigid three-dimensional model are extracted through the convolutional neural network, the view may be input into the established convolutional neural network in a size of 224 × 224, and the high-dimensional convolutional features are output through the convolutional layers and the maximum pooling layers, optionally, the features output by the last convolutional layer may be used as the convolutional features of the corresponding view, and then the extracted convolutional features of each view may be 512-dimensional.

And step 130, acquiring the image entropy of each view image, and determining the confidence corresponding to the convolution feature of each view image according to the image entropy.

After extracting the convolution characteristics of each view image through a convolution neural network, the electronic equipment can combine the convolution characteristics of each view image to generate a characteristic descriptor of a non-rigid three-dimensional model for describing the non-rigid three-dimensional model, and can search an entity model matched with the non-rigid three-dimensional model needing to be searched in a three-dimensional model library according to the characteristic descriptor, wherein the three-dimensional model library can store the characteristic descriptors of a plurality of entity models, the characteristic descriptor of the non-rigid three-dimensional model needing to be searched can be compared with the characteristic descriptors of each entity model in the three-dimensional model library, and the similarity between the non-rigid three-dimensional model and each entity model is calculated, so that the matched entity model is searched.

In some embodiments, because each view map contains different information content, and the reliability of the similarity of the feature characterization is different, it is necessary to assign a confidence to the feature of each view map, and the confidence can be used to represent the reliability of the convolution feature of the corresponding view map, and the greater the confidence, the higher the reliability. The entropy of an image represents the average number of bits of a set of image gray levels, describing the average amount of information of the image. The electronic equipment can acquire the image entropy of each view map, and determines the confidence corresponding to the convolution characteristics of each view map according to the image entropy of the view map, the more dense the picture content, the fuller the color and the clearer the outline of the view map are, the higher the image entropy is, the larger the image entropy is, the richer the information content contained in the view map can be shown, the reliability of the similarity of the characteristic representation is higher, the greater the confidence is given to the characteristic representation, and the smaller the image entropy is, the smaller the confidence can be given to the characteristic representation.

And step 140, determining the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model library according to the convolution characteristics of the multiple visual angle images and the corresponding confidence degrees.

The electronic device may assign a weight of the convolution feature of the corresponding perspective view according to the confidence of each perspective view, where the higher the confidence is, the larger the assigned weight may be, and the lower the confidence is, the smaller the assigned weight may be. As an implementation manner, when the non-rigid three-dimensional model is compared with each solid model in the three-dimensional model library, the convolution feature of the view angle diagram of each view angle of the non-rigid three-dimensional model and the convolution feature of the view angle diagram of the view angle corresponding to the solid model are compared, and the similarity between the non-rigid three-dimensional model and the solid model is calculated one by one according to the comparison result and the corresponding weight. The non-rigid three-dimensional models can calculate the similarity with each solid model in the three-dimensional model library one by one, and the solid model corresponding to the non-rigid three-dimensional model is retrieved according to the similarity.

And 150, searching the entity model matched with the non-rigid three-dimensional model according to the similarity.

In some embodiments, after obtaining the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model library, the electronic device may output the corresponding entity models in order of magnitude of the similarity, and as a specific embodiment, may output the identity information of the corresponding entity models from large to small according to the similarity, such as outputting the number of the corresponding entity model from large to small, or outputting the identity information of the corresponding entity model from small to large. It is to be understood that the identity information may be various and may be used only for identifying the mockup, and is not limited to the above-mentioned numbers.

In the embodiment, a plurality of view images corresponding to different views in the non-rigid three-dimensional model are obtained, the convolution characteristic of each view image is extracted through the convolution neural network, the image entropy of each view image is obtained, the confidence corresponding to the convolution characteristic of each view image is determined according to the image entropy, and the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model library is determined according to the convolution characteristics of the plurality of view images and the corresponding confidence, so that the entity model matched with the non-rigid three-dimensional model can be retrieved according to the similarity, and the three-dimensional model can be retrieved more accurately and quickly.

As shown in fig. 3, in an embodiment, step 110, acquiring a plurality of perspective views of the non-rigid three-dimensional model, each perspective view corresponding to a different perspective, includes the following steps:

in the virtual space, the non-rigid three-dimensional model is enclosed in a cube, step 112.

The virtual space may refer to a virtual three-dimensional space for constructing a non-rigid three-dimensional model, the electronic device obtains multiple view angle maps of the non-rigid three-dimensional model, the non-rigid three-dimensional model may be enclosed in a cube in the virtual space, the center of the cube may overlap with the center of the non-rigid three-dimensional model, and the non-rigid three-dimensional model may be placed exactly in the middle of the cube.

And step 114, respectively arranging corresponding virtual cameras at the centers of six faces and eight vertexes of the cube, wherein each virtual camera points to the center of the non-rigid three-dimensional model.

The cube comprises six faces and eight vertexes, fourteen virtual cameras can be respectively arranged at the centers of the six faces and the eight vertexes, and all the arranged virtual cameras can point to the center of the non-rigid three-dimensional model, namely all the virtual cameras can point to the center of the cube.

And step 116, rendering the view angle graph of the corresponding view angle through each virtual camera respectively.

The virtual cameras arranged at different positions can capture view angle diagrams of the non-rigid three-dimensional model under different view angles, and the electronic equipment can render the view angle diagrams corresponding to the view angles through the virtual cameras, so that the view angle diagrams corresponding to the different view angles are obtained.

FIG. 4 is a schematic diagram illustrating the acquisition of a perspective view of a non-rigid three-dimensional model in one embodiment. As shown in fig. 4, a non-rigid three-dimensional model is enclosed in a cube 400 in a virtual space, and fourteen virtual cameras are respectively disposed at the centers of six faces and eight vertices of the cube 400 for rendering view angles of the non-rigid three-dimensional model at different view angles, for example, a view angle rendered at vertex a is a view angle 410, and a view angle rendered at vertex B is a view angle 420.

In the embodiment, the dimension of the non-rigid three-dimensional model is reduced, and the non-rigid three-dimensional model is converted into the two-dimensional view image which is convenient to process, so that the processing objects become regular and uniform, and the possibility of fully transplanting the convolutional neural network with excellent performance in the field of two-dimensional image processing is provided.

As shown in fig. 5, in an embodiment, the step 130 of obtaining the image entropy of each view, and determining the confidence corresponding to the convolution feature of each view according to the image entropy includes the following steps:

and 142, acquiring the scale of each view angle image and the pixel gray value of each pixel point.

The entropy of an image represents the average number of bits of a set of image gray levels, describing the average amount of information of the image. As an embodiment, the more dense the picture content, the fuller the color, and the clearer the information content of the view angle map, the larger the image entropy. The electronic equipment qualitatively judges the reliability of the convolution characteristics of each view image by calculating the image entropy of each view image, so that confidence is given.

In some embodiments, the electronic device may obtain a scale of each view map, where the scale is used to represent a size of the view map, and obtain a pixel gray value of each pixel point included in the view map.

Step 144, determining the gray distribution characteristics of the corresponding view angle map according to the pixel gray values of the pixels.

And determining the gray distribution characteristics of each view angle image according to the scale of each view angle image and the pixel gray value of each pixel point.

In one embodiment, determining the gray distribution characteristic of the view angle map can be as shown in equation (3):

wherein i represents the gray value of the pixel, j represents the mean value of the gray levels of the neighborhood, f (i, j) is the frequency of the appearance of the characteristic binary group (, j), N is the scale of the view angle diagram, P is the frequency of the appearance of the characteristic binary group_ijRepresenting a gray scale distribution characteristic.

And step 146, calculating the image entropy of each view according to the scale and the gray distribution characteristics of each view.

The electronic equipment can calculate the image entropy of each view map according to the scale and the gray distribution characteristics of each view map, and the image entropy can be used for highlighting the comprehensive characteristics of the gray distribution in the view map.

In one embodiment, the formula for calculating the entropy of the image can be shown as equation (4):

where H represents the image entropy and i represents the pixel gray value.

And step 148, determining a confidence coefficient corresponding to the convolution characteristic of each view image according to the image entropy, wherein the image entropy and the confidence coefficient are in positive correlation.

The larger the image entropy is, the richer the information content contained in the view map is, the more reliable the similarity of the feature characterization is, the greater the confidence is given to the view map, and the smaller the image entropy is, the smaller the confidence can be given to the view map.

In one embodiment, the step 140 of determining the similarity between the non-rigid three-dimensional model and each solid model in the three-dimensional model library according to the convolution features and the corresponding confidence degrees of the multiple perspective views comprises the following steps:

(a) and respectively calculating the Euclidean distance between each view angle image and the view angle image of the view angle corresponding to the entity model according to the convolution characteristics of the plurality of view angle images.

The electronic device may compare the non-rigid three-dimensional model with each entity model of the three-dimensional model library, may compare the convolution characteristics of the perspective view map of each perspective of the non-rigid three-dimensional model with the convolution characteristics of the perspective view map of the perspective corresponding to the entity model, and calculate the euclidean distance between each perspective view map and the perspective view map of the perspective corresponding to the entity model, respectively, where the euclidean distance may be used to represent the similarity between the perspective view map of the non-rigid three-dimensional model and the perspective view map of the perspective corresponding to the entity model.

As a specific embodiment, the calculation formula of the euclidean distance may be as shown in formula (6):

wherein d represents Euclidean distance, j is the convolution characteristic quantity of the view diagram, x is the convolution characteristic of the view diagram of the non-rigid three-dimensional model, and y is the convolution characteristic of the view diagram corresponding to the solid model.

(b) And determining the similarity between the non-rigid three-dimensional model and the solid model according to the Euclidean distance of each view angle diagram and the corresponding confidence coefficient.

The electronic equipment can qualitatively judge the reliability of the Euclidean distance by calculating the image entropy of each view image, so that confidence is given, and the similarity between the non-rigid three-dimensional model and the entity model can be determined according to the Euclidean distance of each view image and the corresponding confidence.

In one embodiment, the similarity measure between the non-rigid three-dimensional model and the solid model can be represented by the following formula (7):

wherein, S represents similarity, n is the number of the acquired view angle diagrams, d is the calculated euclidean distance of the view angle diagrams, and w is the confidence corresponding to the view angle diagrams.

In some embodiments, after obtaining the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model library, the electronic device may output the similarity between the non-rigid three-dimensional model and each entity model in a predetermined order, for example, may output the corresponding entity models in the order of the magnitude of the similarity, for the user to select, and use the selected entity model as the entity model matched with the non-rigid three-dimensional model. In some embodiments, the real model with the largest similarity can also be used as the model matched with the non-rigid three-dimensional model.

In this embodiment, the euclidean distance of the perspective view corresponding to the solid model is calculated according to the convolution characteristic of each perspective view of the non-rigid three-dimensional model, so that the similarity is determined according to the euclidean distance and the corresponding confidence level, and the three-dimensional model can be retrieved more accurately and quickly.

In one embodiment, a convolutional neural network and multi-view based three-dimensional model retrieval method is provided, and comprises the following steps:

and (1) acquiring a plurality of view angle images of the non-rigid three-dimensional model, wherein each view angle image corresponds to a different view angle.

In one embodiment, before step (1), further comprising: acquiring an image data set, wherein the image data set comprises a plurality of images; performing a pre-processing operation on an image in the image dataset, the pre-processing operation including at least one of translation, rotation, scaling, and symmetric mapping; and training according to the preprocessed image data set to obtain the convolutional neural network.

In one embodiment, the convolutional neural network comprises 13 convolutional layers and 3 fully-connected layers, wherein a plurality of maximum pooling layers are arranged in the convolutional layers, a classification layer is arranged behind the fully-connected layers, each convolutional layer adopts a convolutional kernel with the size of 3 × 3, and the classification layer is used for mapping multidimensional features extracted by the convolutional layers into n-dimensional probability distribution, wherein n is a positive integer greater than 1; after training to obtain the convolutional neural network according to the preprocessed image data set, the method further comprises the following steps: the full connection layer and the classification layer are deleted.

In one embodiment, step (1) comprises: enclosing a non-rigid three-dimensional model in a cube in a virtual space; respectively arranging corresponding virtual cameras at the centers of six faces and eight vertexes of the cube, wherein each virtual camera points to the center of the non-rigid three-dimensional model; rendering a view angle diagram of the corresponding view angle through each virtual camera respectively.

And (2) extracting the convolution characteristics of each view map through a convolution neural network.

And (3) acquiring the image entropy of each view image, and determining the confidence corresponding to the convolution feature of each view image according to the image entropy.

In one embodiment, step (3) comprises: acquiring the scale of each visual angle image and the pixel gray value of each pixel point contained in the visual angle image; determining the gray distribution characteristics of the corresponding view angle image according to the pixel gray values of all the pixel points; calculating the image entropy of each view image according to the scale and the gray distribution characteristics of each view image; and determining the confidence corresponding to the convolution characteristics of each view image according to the image entropy, wherein the image entropy and the confidence are in positive correlation.

And (4) determining the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model library according to the convolution characteristics of the multiple visual angle images and the corresponding confidence degrees.

In one embodiment, step (4) comprises: respectively calculating Euclidean distances between each view angle image and the view angle image of the view angle corresponding to the entity model according to the convolution characteristics of the plurality of view angle images; and determining the similarity between the non-rigid three-dimensional model and the solid model according to the Euclidean distance of each view angle diagram and the corresponding confidence coefficient.

And (5) searching the entity model matched with the non-rigid three-dimensional model according to the similarity.

In one embodiment, step (5) comprises: and outputting the similarity between the non-rigid three-dimensional model and each solid model according to a preset sequence, and taking the selected solid model as a solid model matched with the non-rigid three-dimensional model.

In one embodiment, the Euclidean distance calculation formula is as follows:

wherein d represents a Euclidean distance, j is the convolution characteristic quantity of the view diagram, x is the convolution characteristic of the view diagram of the non-rigid three-dimensional model, and y is the convolution characteristic of the view diagram corresponding to the solid model;

the measurement formula of the similarity between the non-rigid three-dimensional model and the solid model is as follows:

As shown in fig. 6, in an embodiment, a three-dimensional model retrieving apparatus 600 based on a convolutional neural network and a multi-view map is provided, which includes an obtaining module 610, an extracting module 620, a confidence determining module 630, a similarity determining module 640, and a retrieving module 650.

The obtaining module 610 is configured to obtain multiple view angle maps of the non-rigid three-dimensional model, where each view angle map corresponds to a different view angle.

And an extracting module 620, configured to extract the convolution feature of each view map through a convolution neural network.

The confidence determining module 630 is configured to obtain an image entropy of each view, and determine a confidence corresponding to the convolution feature of each view according to the image entropy.

And the similarity determining module 640 is configured to determine the similarity between the non-rigid three-dimensional model and each entity model in the three-dimensional model library according to the convolution features of the multiple view maps and the corresponding confidence degrees.

And the retrieval module 650 is used for retrieving the entity model matched with the non-rigid three-dimensional model according to the similarity.

In one embodiment, the apparatus 600 for retrieving a three-dimensional model based on a convolutional neural network and a multi-view map includes a training module in addition to the acquiring module 610, the extracting module 620, the confidence determining module 630, the similarity determining module 640, and the retrieving module 650.

The training module is used for acquiring an image data set, and the image data set comprises a plurality of images; performing a pre-processing operation on an image in the image dataset, the pre-processing operation including at least one of translation, rotation, scaling, and symmetric mapping; and training according to the preprocessed image data set to obtain the convolutional neural network.

In one embodiment, the convolutional neural network includes 13 convolutional layers and 3 fully-connected layers, a plurality of largest pooling layers are disposed in the convolutional layers, a classification layer is disposed behind the fully-connected layers, each convolutional layer adopts a convolutional kernel with a size of 3 × 3, and the classification layer is used for mapping multidimensional features extracted by the convolutional layers into n-dimensional probability distribution, wherein n is a positive integer greater than 1.

In one embodiment, the training module is further configured to delete the fully-connected layer and the classified layer.

In one embodiment, the obtaining module 610 includes a surrounding unit, a setting unit, and a rendering unit.

And the surrounding unit is used for surrounding the non-rigid three-dimensional model in a cube in the virtual space.

And the setting unit is used for respectively setting corresponding virtual cameras at the centers of six faces and eight vertexes of the cube, and each virtual camera points to the center of the non-rigid three-dimensional model.

And the rendering unit is used for rendering the view angle graph of the corresponding view angle through each virtual camera.

In one embodiment, the confidence determining module 630 includes a gray level obtaining unit, a distribution feature determining unit, an image entropy calculating unit, and a confidence determining unit.

And the gray scale acquisition unit is used for acquiring the scale of each visual angle image and the pixel gray scale value of each pixel point.

And the distribution characteristic determining unit is used for determining the gray distribution characteristic of the corresponding view angle image according to the pixel gray value of each pixel point.

And the image entropy calculating unit is used for calculating the image entropy of each view map according to the scale and the gray distribution characteristics of each view map.

And the confidence coefficient determining unit is used for determining the confidence coefficient corresponding to the convolution characteristic of each view image according to the image entropy, and the image entropy and the confidence coefficient are in positive correlation.

In one embodiment, the similarity determination module 640 includes a euclidean distance calculation unit and a similarity determination unit.

And the Euclidean distance calculating unit is used for respectively calculating the Euclidean distance between each view angle image and the view angle image of the view angle corresponding to the entity model according to the convolution characteristics of the plurality of view angle images.

And the similarity determining unit is used for determining the similarity between the non-rigid three-dimensional model and the entity model according to the Euclidean distance of each view angle diagram and the corresponding confidence coefficient.

In one embodiment, the retrieving module 650 is further configured to output the similarity between the non-rigid three-dimensional model and each solid model in a predetermined order, and use the selected solid model as a solid model matching the non-rigid three-dimensional model.

In one embodiment, the Euclidean distance calculation formula is as follows:

In one embodiment, the metric of similarity between the non-rigid three-dimensional model and the solid model is formulated as:

Fig. 7 is a block diagram of an electronic device in one embodiment. As shown in fig. 7, in an embodiment, the electronic device 10 may be a server, or may be a terminal device such as a desktop computer or a notebook computer. The electronic device 10 may include one or more of the following components: a processor 11 and a memory 13, wherein one or more application programs may be stored in the memory 13 and configured to be executed by the one or more processors 11, the one or more programs configured to perform the convolutional neural network and multi-view based three-dimensional model retrieval method as described in the above embodiments.

Processor 11 may include one or more processing cores. The processor 11, which is connected to various components throughout the electronic device 10 using various interfaces and lines, performs various functions of the electronic device 10 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 13, and calling data stored in the memory 13. Alternatively, the processor 11 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 11 may integrate one or a combination of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may be implemented by a communication chip without being integrated into the processor 11.

The Memory 13 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 13 may be used to store instructions, programs, code sets or instruction sets. The memory 13 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described method embodiments, and the like. The stored data area may also store data created during use by the electronic device 10, and the like.

It is understood that the electronic device 10 may include more or less structural elements than those shown in the above structural block diagrams, and is not limited thereto.

In one embodiment, a computer-readable storage medium is further provided, on which a computer program is stored, which when executed by a processor implements the method for retrieving a three-dimensional model based on a convolutional neural network and a multi-view map as described in the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), or the like.

Any reference to memory, storage, database, or other medium as used herein may include non-volatile and/or volatile memory. Suitable non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A three-dimensional model retrieval method based on a convolutional neural network and a multi-view map is characterized by comprising the following steps:

acquiring the size of each visual angle image and the pixel gray value of each pixel point contained in the visual angle image;

determining the gray distribution characteristics of the corresponding view angle image according to the pixel gray values of the pixel points;

calculating the image entropy of each view image according to the size and the gray distribution characteristics of each view image;

determining a confidence coefficient corresponding to the convolution feature of each view image according to the image entropy, wherein the image entropy and the confidence coefficient are in positive correlation;

retrieving a solid model matched with the non-rigid three-dimensional model according to the similarity;

prior to the acquiring the multiple perspective views of the non-rigid three-dimensional model, the method further comprises:

acquiring an image data set, wherein the image data set comprises a plurality of images;

performing a pre-processing operation on an image in the image dataset, the pre-processing operation including at least one of translation, rotation, scaling, and symmetric mapping;

and training according to the preprocessed image data set to obtain the convolutional neural network.

2. The method of claim 1, wherein the convolutional neural network comprises 13 convolutional layers in which a plurality of max pooling layers are disposed and 3 fully-connected layers after which a classification layer is disposed, each convolutional layer employing a convolutional kernel having a size of 3 x 3, the classification layer being for mapping multidimensional features extracted by the convolutional layers to a probability distribution of n-dimensions, where n is a positive integer greater than 1;

after training from the pre-processed image data set to obtain a convolutional neural network, the method further comprises: and deleting the full connection layer and the classification layer.

3. The method of claim 1, wherein the acquiring multiple perspective views of the non-rigid three-dimensional model comprises:

enclosing a non-rigid three-dimensional model in a cube in a virtual space;

respectively arranging corresponding virtual cameras at the centers of six faces and eight vertexes of the cube, wherein each virtual camera points to the center of the non-rigid three-dimensional model;

rendering a view angle diagram of the corresponding view angle through each virtual camera respectively.

4. The method of claim 1, wherein determining the similarity between the non-rigid three-dimensional model and each solid model in the three-dimensional model library according to the convolution features and the corresponding confidence degrees of the plurality of perspective views comprises:

respectively calculating the Euclidean distance between each view angle image and the view angle image of the view angle corresponding to the entity model according to the convolution characteristics of the plurality of view angle images;

determining the similarity between the non-rigid three-dimensional model and the entity model according to the Euclidean distance of each view angle diagram and the corresponding confidence coefficient;

the retrieving of the solid model matching the non-rigid three-dimensional model according to the similarity comprises:

and outputting the similarity of the non-rigid three-dimensional model and each solid model according to a preset sequence, and taking the selected solid model as a solid model matched with the non-rigid three-dimensional model.

5. The method of claim 4, wherein the Euclidean distance is calculated as follows:

wherein d is_i(x, y) represents the Euclidean distance of the convolution feature of the ith view angle diagram of the non-rigid three-dimensional model and the solid model, x is the convolution feature of the ith view angle diagram of the non-rigid three-dimensional model, y is the convolution feature of the ith view angle diagram corresponding to the solid model, j is the dimension of the convolution feature x and y, and x and y are dimensions of the convolution feature x and y_kFor the k-dimension convolution characteristic, y, of the ith view angle map of the non-rigid three-dimensional model_kThe k-dimension convolution characteristic of the ith view angle image corresponding to the solid model is obtained; the measurement formula of the similarity between the non-rigid three-dimensional model and the solid model is as follows:

wherein S represents similarity, n is the number of acquired view angle maps, d_iAnd in order to calculate the Euclidean distance of the convolution features of the ith view angle diagram of the non-rigid three-dimensional model and the solid model, w is the confidence corresponding to the view angle diagram, and i represents the ith view angle diagram.

6. A three-dimensional model retrieval device based on a convolutional neural network and a multi-view map is characterized by comprising:

the confidence coefficient determining module is used for acquiring the size of each visual angle image and the pixel gray value of each contained pixel point; determining the gray distribution characteristics of the corresponding view angle image according to the pixel gray values of the pixel points; calculating the image entropy of each view image according to the size and the gray distribution characteristics of each view image; determining a confidence coefficient corresponding to the convolution feature of each view image according to the image entropy, wherein the image entropy and the confidence coefficient are in positive correlation;

the retrieval module is used for retrieving the entity model matched with the non-rigid three-dimensional model according to the similarity;

7. An electronic device comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, causes the processor to carry out the method of any of claims 1 to 5.

8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1 to 5.