CN111738306A

CN111738306A - Multi-view three-dimensional model retrieval method based on block convolution neural network

Info

Publication number: CN111738306A
Application number: CN202010487922.5A
Authority: CN
Inventors: 高赞; 邵煜翔; 程志勇; 陈达; 舒明雷; 聂礼强
Original assignee: Shandong Computer Science Center National Super Computing Center in Jinan; Shandong Institute of Artificial Intelligence
Current assignee: Shandong Computer Science Center National Super Computing Center in Jinan; Shandong Institute of Artificial Intelligence
Priority date: 2020-06-01
Filing date: 2020-06-01
Publication date: 2020-10-02
Anticipated expiration: 2040-06-01
Also published as: CN111738306B

Abstract

A multi-view three-dimensional model retrieval method based on a block convolutional neural network is used for mining intrinsic relations among views in the process of extracting view features by using block convolutional layers aiming at multi-view images. Different weights are distributed to each view according to cosine similarity between each view feature and the features after the maximal view pooling, and the distinguishing performance between the view features is utilized to obtain more distinguishing model features. When the loss function is generated, not only the model characteristic but also the view characteristic are considered, so that the network can be better constrained to learn. The multi-view three-dimensional model retrieval method based on the block convolutional neural network achieves excellent performance in the relevant three-dimensional model retrieval data set.

Description

Multi-view three-dimensional model retrieval method based on block convolution neural network

Technical Field

The invention relates to the field of three-dimensional vision, in particular to a multi-view three-dimensional model retrieval method based on a block convolution neural network.

Background

Three-dimensional vision is receiving increasing attention from researchers as three-dimensional representation technology and computer hardware capabilities develop. Three-dimensional vision is a more realistic description of the real world, containing spatial structure information of three-dimensional objects and characteristics of solid geometry and contour curves, compared to traditional two-dimensional images. Three-dimensional model retrieval is a research hotspot in the field of three-dimensional vision. The related research method can be divided into two stages, namely (1) a three-dimensional model retrieval method based on the traditional method and (2) a three-dimensional model retrieval method based on deep learning.

The three-dimensional model retrieval method based on the conventional method is to generate feature descriptors from information of the three-dimensional model, such as geometric moments, surface distribution, volume descriptors and surface geometry, and then to perform a measure of similarity between features using euclidean distances. However, in practical applications, the traditional method for rendering and feature extraction of the three-dimensional model is not only computationally intensive but also not ideal. These difficulties have limited the development of the related conventional methods. The three-dimensional model retrieval method based on deep learning comprises a model-based retrieval algorithm and a multi-view-based retrieval algorithm. The model-based search algorithm is to directly extract features on a three-dimensional model by using a deep neural network, and can be divided into a voxel-based method, a grid-based method and a point cloud-based method according to different representations of the model. Since deep learning has made a significant development in the related task of two-dimensional images, multi-view based three-dimensional model algorithms have been proposed. Images of a three-dimensional model at various angles are taken by a camera set at a plurality of angles, and the three-dimensional model is represented by two-dimensional images at different angles. And processing the strong performance of the two-dimensional image by means of a deep neural network such as AlexNet, VGG, GoogleNet, ResNet and the like, extracting the characteristics of each view, and fusing all the view characteristics to obtain the final model characteristics. In retrieving matches, Euclidean distances are used to make a measure of similarity between features.

In a multi-view-based three-dimensional model retrieval method, each three-dimensional model is collectively represented by multiple views. The existing method extracts the features of each view respectively, and then mines the relationship among the features of the views. The relationships between the views are not taken into account in the process of view feature extraction. How to mine the relationship between the views in the extraction process of the view features becomes a key for improving the performance of the retrieval algorithm.

Disclosure of Invention

The invention provides a multi-view three-dimensional model retrieval method based on a block convolutional neural network to overcome the defects of the above technology, which can realize the relation mining among multiple views according to the block characteristics and block coordinates of each view, mine the discriminative information among the views by distributing different weights of the views through a self-adaptive view weight layer, and extract more discriminative model characteristics by using a discriminative loss function constraint network in training.

The technical scheme adopted by the invention for overcoming the technical problems is as follows:

a multi-view three-dimensional model retrieval method based on a block convolutional neural network comprises the following steps:

a) rendering the three-dimensional model to obtain N two-dimensional views of the three-dimensional model;

b) inputting each two-dimensional view into a convolutional neural network, wherein each two-dimensional view obtains M feature blocks, and the feature dimension of each feature block is P dimension;

c) searching k adjacent blocks of each feature block in a P-dimensional feature space according to Euclidean distances, constructing an adjacent graph by using the central block and the k adjacent blocks, wherein each adjacent graph comprises k +1 vertexes and k undirected edges, and the k undirected edges are respectively connected with the central block and the 12 adjacent blocks, and adopting (P)_i,p_j-p_i) Edge features representing undirected edges, where p_iAs a central block, p_jFor the jth adjacent block, the three-dimensional coordinates (x, y, z) are used to represent the block in the xth row and the yth column of the zth two-dimensional view, and the edge feature between the two blocks is represented as (p)_i,p_j-p_i,c_i,c_j-c_i)，c_iThree-dimensional coordinates of the central block, c_jIs the three-dimensional coordinate of the jth adjacent block, the dimension of the edge feature is E dimension, and the three-dimensional coordinate is obtained by a formula

Performing convolution operation on the block to obtain new block characteristics p_i', where h is a nonlinear function, and N (i) is a neighborhood sample obtained by the ith sample through a KNN algorithm;

d) new block characteristics p_iUsing average pooling layer fusion block characteristics to obtain view characteristics of each two-dimensional view, using one-dimensional convolution to extract the relation between adjacent views, obtaining pooled characteristics g of the two-dimensional views through maximum view pooling, and obtaining pooled characteristics g of the two-dimensional views through a formula

Computing the pooled feature g and the jth view feature f_jCosine similarity s between_j(f_jG), where D is the characteristic dimension and k is the kth dimension, by means of the formula

Calculating the weight α of the cosine similarity of the jth two-dimensional view_jIn the formula s_i(f_iG) features g after pooling and the ith view feature f_iCosine similarity between them, by the formula f_j′＝α_j×f_jCalculating weighted view features f_j', by the formula

Calculating weighted model characteristics g';

e) according to the formula L_Dis＝β×L_model+γ×L_viewsCalculating a loss function L_DisIn the formula L_modelGenerating for model features gValue of loss function of L_viewsAs a view feature f_j' value of the resulting loss function, β is denoted L_modelWeight of the super parameter, gamma is L_viewsThe size of the weight is super-parameter,

in the formula

A loss function value generated for a view feature of a jth two-dimensional view;

f) for a given three-dimensional model, the three-dimensional model that most closely resembles the model is retrieved from the target dataset.

Further, the three-dimensional model in the step a) is a three-dimensional model generated by a computer, and the computer renders the three-dimensional model to obtain a plurality of two-dimensional views.

Further, the three-dimensional model in the step a) is a real-world object, and X cameras with different angles are arranged around the object to acquire two-dimensional views of the object at different angles.

Further, one camera is disposed every 30 degrees in the circumferential direction around the object, and 12 two-dimensional views are obtained by disposing 12 cameras.

Further, each two-dimensional view in step b) is represented by 7 × 7 block features after passing through the convolutional neural network, and the feature dimension of each block feature is 512.

Further, k is set to 12 and E is set to 1030 in step c).

Further, using 7 × 7 average pooling layer fusion block features in step d), a one-dimensional convolution with a convolution kernel size of 3 is used to extract the relationship between adjacent views.

Further, in step e), β is 0.5 and γ is 0.5.

Further, the loss value weight of each two-dimensional view is set to α in step e)_j′，

By the formula

Computing view characteristics f_j' loss function value generated.

Further, in step f), model characteristics of the retrieved model and all models in the query set are respectively extracted, and a formula is utilized

Calculating to obtain Euclidean distance D (q, p) between model feature p of the retrieved model and model features q of all models in the query set, wherein p in the formula_iFor the feature value of the i-th dimension in the model feature p of the model to be retrieved, q_iFor querying the characteristic value of ith dimension in model characteristic q of all models in the set, n is the characteristic dimension, and by the formula Redist (q)_i,g_j)＝1-prob(g_j|label＝q_i) × D (p, q) calculating the rearranged distance Redist (q)_i,g_j) In which prob (g)_j|label＝q_i) Is the probability of belonging to the same class as the retrieved model.

The invention has the beneficial effects that: by using block convolution layers for multi-view images, the intrinsic relations between views are mined in the extraction process of view features. Different weights are distributed to each view according to cosine similarity between each view feature and the features after the maximal view pooling, and the distinguishing performance between the view features is utilized to obtain more distinguishing model features. When the loss function is generated, not only the model characteristic but also the view characteristic are considered, so that the network can be better constrained to learn. The multi-view three-dimensional model retrieval method based on the block convolutional neural network achieves excellent performance in the relevant three-dimensional model retrieval data set.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The invention is further described below with reference to fig. 1.

a) and rendering the three-dimensional model to obtain N two-dimensional views of the three-dimensional model. The three-dimensional model may be a computer-generated three-dimensional model, the computer rendering the three-dimensional model resulting in a plurality of two-dimensional views. The three-dimensional model can also be a real world object, and X cameras with different angles are arranged around the object to acquire two-dimensional views of the object at different angles. Preferably, one camera is arranged at intervals of 30 degrees in the circumferential direction around the object, and 12 two-dimensional views are obtained by arranging 12 cameras.

b) In the previous multi-view method, the obtained multiple views are directly and simultaneously input into a convolutional neural network with shared weight to obtain the view characteristics of each view. In the method, after the block features are obtained, view features are not directly obtained by using average pooling, relationship mining needs to be carried out on all the block features, each two-dimensional view is input into a convolutional neural network, M feature blocks are obtained from each two-dimensional view, and the feature dimension of each feature block is P dimension. Each three-dimensional model is now represented collectively by 7 x 7 block features of 12 views, totaling 588 block features.

c) Aiming at the problem that the relation among multiple views cannot be mined in the process of extracting view features by the current method, a new neural network module is provided, and the neural network module is called block convolution. Relationship mining between multiple views can be implemented according to block features and block coordinates of each view. Firstly, searching k adjacent blocks of each feature block in a P-dimensional feature space according to Euclidean distance, wherein k can be set to be 12, constructing an adjacent map by using a central block and the k adjacent blocks, wherein each adjacent map comprises k +1 vertexes and k undirected edges, and the k undirected edges are respectively connected with the central block and the 12 adjacent blocks, and adopting (P)_i,p_j-p_i) Edge features representing undirected edges, where p_iAs a central block, p_jFor the jth neighbor block, considering that each block feature has its specific location in the view, the block is represented in the xth row and yth column of the z-th two-dimensional view using three-dimensional coordinates (x, y, z), and the edge feature between two blocks is represented by (p) by adding the coordinate information of the block_i,p_j-p_i,c_i,c_j-c_i)，c_iThree-dimensional coordinates of the central block, c_jFor the three-dimensional seating of the jth neighbor blockDimension E of edge feature is set to 1030 through formula

Performing convolution operation on the block to obtain new block characteristics p_i' where h is a non-linear function, and is implemented by convolution operation of 1 × 1, reducing the edge feature of 1030 dimension to 515 dimension, n (i) a neighborhood sample obtained by KNN algorithm for the ith sample, new block feature contains information between the current block and the neighboring block in the feature space.

d) New block characteristics p_iThe present invention contemplates adaptive view weight layers assigning different weights to each view, unlike the previous fusion method using max-view pooling, the invention obtains pooled feature g of two-dimensional views by max-view pooling, by formulating a formula

Calculating weighted model characteristics g';

e) previous methods all require the network to accurately classify the model features only, thereby producing loss function values. In this process, the view characteristics are not taken into account. Although each view feature cannot represent a model individually, it contains information about the model at a particular angle. Two classifiers are set in the network, which require accurate classification of model features and view features, respectively. The loss function is thus composed of two parts, one part of the loss function being calculated from the model features and the other part of the loss function being calculated from the view features. The obtained discriminative loss function can better constrain the network to train and extract more discriminative model features. According to the formula L_Dis＝β×L_model+γ×L_viewsCalculating a loss function L_DisIn the formula L_modelValue of the loss function generated for model feature g', L_viewsAs a view feature f_j' value of the resulting loss function, β is denoted L_modelWeight of the super parameter, gamma is L_viewsThe weight of the super parameter is calculated in the process of calculating L_viewsIn the process of (1), firstly, an averaging mode is used to take the average value of the loss function value of each view,

in the formula

The value of the loss function generated for the view feature of the jth two-dimensional view is preferably β ═ 0.5 and γ ═ 0.5.

By using block convolution layers for multi-view images, the intrinsic relations between views are mined in the extraction process of view features. Different weights are distributed to each view according to cosine similarity between each view feature and the features after the maximal view pooling, and the distinguishing performance between the view features is utilized to obtain more distinguishing model features. When the loss function is generated, not only the model characteristic but also the view characteristic are considered, so that the network can be better constrained to learn. The multi-view three-dimensional model retrieval method based on the block convolutional neural network achieves excellent performance in the relevant three-dimensional model retrieval data set.

Each view is assigned a different weight, with smaller weights indicating that the view features are less similar to the maximally pooled features, indicating that the view is more difficult to resolve. If the network can focus more on these views that are difficult to distinguish, more discriminative information can be obtained, which is very beneficial to the retrieval of the model. Weighting the loss value of each view to

Indicating that views with low view weights are assigned higher penalty weights and views with high view weights are assigned smaller penalty weights, thereby allowing the indistinguishable views to have a greater effect on feature extraction step e) to set the penalty weight of each two-dimensional view to α_j′，

By the formula

Computing view characteristics f_j' loss function value generated. Given any one three-dimensional model, the three-dimensional model most similar to the model is retrieved in the target dataset. The process of retrieval is as follows: and respectively extracting model features of the retrieved model and all models in the query set, calculating Euclidean distances pairwise, sequencing the models in the query set according to the Euclidean distances, and returning a query result according to the sequence. Thus, in step f) the models of the retrieved model and all models in the query set are extracted separatelyCharacteristic of, using a formula

Calculating to obtain Euclidean distance D (q, p) between model feature p of the retrieved model and model features q of all models in the query set, wherein p in the formula_iFor the feature value of the i-th dimension in the model feature p of the model to be retrieved, q_iThe feature value of the ith dimension in the model feature q of all models in the query set is defined, and n is the feature dimension. In the retrieval task, the initially obtained sequencing result can be rearranged so as to improve the effect. In the present invention, the classification information of the model features is reordered from the initial ranking results by means of a trained network. First, the network classifies the retrieved model and determines that it belongs to a certain class. Then, all models in the query dataset are classified, resulting in a probability that they belong to each class respectively. By the formula Redist (q)_i,g_j)＝1-prob(g_j|label＝q_i) × D (p, q) calculating the rearranged distance Redist (q)_i,g_j) In which prob (g)_j|label＝q_i) Is the probability of belonging to the same class as the retrieved model. This means that the higher the probability of belonging to the same class, the smaller the weight of their euclidean distance will be compared to the model with lower probability, so that the euclidean distance between the two becomes smaller. By means of the classification capability of the network, the effect of network retrieval is improved.

Claims

1. A multi-view three-dimensional model retrieval method based on a block convolution neural network is characterized by comprising the following steps:

c) searching k adjacent blocks of each feature block in a P-dimensional feature space according to Euclidean distances, constructing an adjacent graph by using the central block and the k adjacent blocks, wherein each adjacent graph comprises k +1 vertexes and k undirected edges, and the k undirected edges are respectively connected with the central block and the k undirected edges12 neighboring blocks, using (p)_i,p_j-p_i) Edge features representing undirected edges, where p_iAs a central block, p_jFor the jth adjacent block, the three-dimensional coordinates (x, y, z) are used to represent the block in the xth row and the yth column of the zth two-dimensional view, and the edge feature between the two blocks is represented as (p)_i,p_j-p_i,c_i,c_j-c_i)，c_iThree-dimensional coordinates of the central block, c_jIs the three-dimensional coordinate of the jth adjacent block, the dimension of the edge feature is E dimension, and the three-dimensional coordinate is obtained by a formula

Calculating weighted model characteristics g';

e) according to the formula L_Dis＝β×L_model+γ×L_viewsCalculating a loss function L_DisIn the formula L_modelValue of the loss function generated for model feature g', L_viewsAs a view feature f_j' value of the resulting loss function, β is denoted L_modelWeight of the super parameter, gamma is L_viewsThe size of the weight is super-parameter,

in the formula

2. The block convolutional neural network-based multi-view three-dimensional model retrieval method of claim 1, wherein: the three-dimensional model in the step a) is a three-dimensional model generated by a computer, and the computer renders the three-dimensional model to obtain a plurality of two-dimensional views.

3. The block convolutional neural network-based multi-view three-dimensional model retrieval method of claim 1, wherein: the three-dimensional model in the step a) is a real-world object, and X cameras with different angles are arranged around the object to acquire two-dimensional views of the object at different angles.

4. The block convolutional neural network-based multi-view three-dimensional model retrieval method of claim 3, wherein: one camera is arranged every 30 degrees in the circumferential direction around the object, and 12 two-dimensional views are obtained by arranging 12 cameras.

5. The block convolutional neural network-based multi-view three-dimensional model retrieval method of claim 1, wherein: each two-dimensional view in the step b) is represented by 7 multiplied by 7 block features after passing through the convolutional neural network, and the feature dimension of each block feature is 512.

6. The block convolutional neural network-based multi-view three-dimensional model retrieval method of claim 1, wherein: in step c) k is set to 12 and E is set to 1030.

7. The block convolutional neural network-based multi-view three-dimensional model retrieval method of claim 1, wherein: using 7 × 7 average pooling layer fusion block features in step d), extracting the relationship between adjacent views using a one-dimensional convolution with a convolution kernel size of 3.

8. The block convolutional neural network-based multi-view three-dimensional model retrieval method of claim 1, wherein: in step e), β is 0.5 and γ is 0.5.

9. The method for retrieving the multi-view three-dimensional model based on the block convolutional neural network as claimed in claim 1, wherein the weight of the loss value of each two-dimensional view is set to α in step e)_j′，

By the formula

Computing view characteristics f_j' loss function value generated.

10. The block convolutional neural network-based multi-view three-dimensional model retrieval method of claim 1, wherein: in step f), the model characteristics of the retrieved model and all models in the query set are respectively extracted, and a formula is utilized

Is calculated to obtain the quiltSearching Euclidean distance D (q, p) between model characteristic p of model and model characteristic q of all models in query set, wherein p is_iFor the feature value of the i-th dimension in the model feature p of the model to be retrieved, q_iFor querying the characteristic value of ith dimension in model characteristic q of all models in the set, n is the characteristic dimension, and by the formula Redist (q)_i,g_j)＝1-prob(g_j|label＝q_i) × D (p, q) calculating the rearranged distance Redist (q)_i,g_j) In which prob (g)_j|label＝q_i) Is the probability of belonging to the same class as the retrieved model.