CN112785684A

CN112785684A - Three-dimensional model reconstruction method based on local information weighting mechanism

Info

Publication number: CN112785684A
Application number: CN202011270682.XA
Authority: CN
Inventors: 冷彪; 杨量
Original assignee: Beihang University
Current assignee: Beijing Guoxin Hongsi Technology Co ltd
Priority date: 2020-11-13
Filing date: 2020-11-13
Publication date: 2021-05-11
Anticipated expiration: 2040-11-13
Also published as: CN112785684B

Abstract

The invention relates to a three-dimensional model reconstruction method based on a local information weighting mechanism, which comprises the following steps: the method comprises the following steps: performing feature extraction on an input picture by using a backbone network to obtain two-dimensional image features; step two: converting the two-dimensional image characteristics obtained in the step one into three-dimensional characteristics; step three: generating a weight distribution map capable of measuring the importance degree of the two-dimensional characteristic local information by using a branch network based on an attention mechanism; step four: acting the weight distribution map obtained in the step three on the two-dimensional image features extracted from the backbone network, and further obtaining two-dimensional reinforced features of local information after screening; step five: and D, performing dimension conversion operation on the two-dimensional reinforced features obtained in the step four to generate three-dimensional reinforced features, and then complexing with a backbone network to generate a final three-dimensional voxel model prediction result.

Description

Three-dimensional model reconstruction method based on local information weighting mechanism

Technical Field

The invention relates to a three-dimensional model reconstruction method based on a local information weighting mechanism, and belongs to the field of deep learning and computer vision.

Background

The three-dimensional model reconstruction technology based on the view is a basic technology of computer vision, namely, a three-dimensional model of an object is reconstructed according to a 2D view of the object, and the three-dimensional model reconstruction technology is used as a key technology of environment perception and has wide application prospects in the fields of smart cities, 3D printing, virtual reality, moving target detection, behavior analysis and the like.

The three-dimensional model reconstruction technology based on the view can be divided into a traditional three-dimensional model reconstruction method and a three-dimensional model reconstruction method based on deep learning, the traditional three-dimensional model reconstruction technology mainly reconstructs a three-dimensional model structure through geometric information such as brightness change, parallax and the like of pixel points, and the traditional three-dimensional model reconstruction technology is roughly divided into a texture derivation method, a motion method and a contour method. The reconstruction technology based on deep learning utilizes image information to directly reconstruct a three-dimensional model, and is more in line with a human visual analysis mode. In recent years, the rapid development of the deep learning technology in the field of image processing greatly improves the acquisition capability of a computer for image information, and further improves the accuracy of a three-dimensional model reconstructed based on a view. The deep learning technology is used in the image-based three-dimensional model reconstruction task, so that the reconstruction effect is greatly improved. For the three-dimensional model reconstruction technology using the deep learning method, the representation form of the three-dimensional object is important, and the three-dimensional model representation models commonly used at present have the following three types: the point cloud model, the patch model and the voxel model are different in corresponding neural network model and reconstruction strategy for three-dimensional models in different representation forms. The three-dimensional model reconstruction network based on deep learning usually adopts a coding-decoding mode, a feature extraction process is divided into two parts of extraction of two-dimensional image information and extraction of three-dimensional geometric features, the two processes are connected through a space transformation operation, and then a final three-dimensional model prediction result is obtained.

However, the existing three-dimensional model reconstruction method cannot extract the two-dimensional image information sufficiently, so that local information is lost, and further, details of a prediction model are lost easily.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the problem that local information is lost due to insufficient extraction of two-dimensional image information and details of a prediction model are lost in the conventional three-dimensional model reconstruction method is solved, and a two-branch network structure is provided, so that the characteristics of a two-dimensional image can be more fully utilized and a more accurate three-dimensional voxel model can be predicted.

The technical scheme of the invention is as follows: a three-dimensional model reconstruction method based on a local information weighting mechanism comprises two sub-networks, a main network generates rough three-dimensional characteristics by utilizing a coder-decoder network structure, a branch network extracts and screens two-dimensional image characteristics of each level in the main network, and the screened local information is combined with the main network in a three-dimensional characteristic mode to form a more accurate three-dimensional voxel model prediction result.

The algorithm of the design applies a deep neural network, is a three-dimensional voxel model reconstruction method based on a local information weighting mechanism, and comprises the following steps:

the method comprises the following steps: image feature extraction using a backbone network

The improved 3D-R2N2 network structure is used as a backbone network to extract image features of a picture to be predicted, the 3D-R2N2 network is composed of two parts, namely two-dimensional image feature extraction and three-dimensional model feature extraction, the two-dimensional image feature extraction part comprises 6 convolution modules, and the sizes of the features are reduced in a pooling operation mode. After an image I is input into a network, a plurality of feature maps with different sizes are generated in the two-dimensional image feature extraction process, a feature map F with the smallest size and the largest number of times of convolution operation is taken as the final output of a two-dimensional image feature extraction part, the number of channels of the feature map F is 512, and the size of the feature map F is 1/32 of the size of an input image.

Step two: converting two-dimensional image features into three-dimensional model features

The feature F obtained in the step one is a two-dimensional feature, when feature dimension conversion is carried out, the F is firstly converted into a one-dimensional feature through matrix compression operation, then the one-dimensional feature is further extracted through two full-link operations, finally the one-dimensional feature is adjusted into a three-dimensional feature through matrix dimension recombination operation, the converted three-dimensional feature is input into a three-dimensional model feature extraction part of a backbone network, feature extraction is carried out through a 4-layer 3D convolution module, meanwhile, the size of the three-dimensional feature is amplified through anti-pooling operation, and finally the three-dimensional feature with the corresponding size is generated.

Step three: designing a branch network based on attention mechanism

In order to better utilize the local information of objects contained in the two-dimensional image characteristics, the method designs a branch network consisting of 8 2D convolution layers, can strengthen and screen the important local information of the two-dimensional characteristics in a main network, and the branch network encodes an input image I and generates a matrix M ═ M representing the local information sensitivity of each area of the image₁，m₂，...，m_kAnd converting M into a weight distribution graph A ═ a which represents the importance degree of local information in each region of the image₁，a₂，...，a_kThe implementation process is shown as the following formula:

wherein m is_iRepresenting the sensitivity value of each region in the feature map to local information, a_iThe local information importance weight represented by each region in the weight-degree distribution graph is represented.

Step four: weighting local information of two-dimensional features of backbone network

Applying the weight distribution diagram A obtained by the third step on a middle two-dimensional characteristic diagram F in the backbone network_mAnd further obtaining the feature F' of the important local information which is enhanced, and the implementation process is as follows:

in the above formula, F' is the two-dimensional feature after local information screening,

representing the dot product. After the weighting of the weight distribution map A, the local information area which has a larger contribution to model reconstruction in the F' is enhanced.

Step five: combining a branch network with a backbone network

And converting the reinforced characteristics F ' obtained in the fourth step into three-dimensional reinforced characteristics G ' according to the dimension conversion method in the second step, combining the G ' with the three-dimensional characteristics G corresponding to the size of the extracted part of the three-dimensional model characteristics of the main network, further reinforcing some areas containing more important local information to generate more comprehensive and more accurate three-dimensional characteristics containing the local characteristics, and finally generating a final voxel model V through the operation of a sigmoid function.

The realization process is as follows:

V＝sigmoid(G+G)

wherein sigmoid is a sigmoid operation function, V is a three-dimensional voxel model which is finally predicted, G is a three-dimensional feature generated by a main network, and G' is a three-dimensional reinforced feature generated by a branch network.

Compared with the prior art, the invention has the advantages and effects that:

(1) the method has important application in the field of three-dimensional model reconstruction. The problem that the accuracy of a prediction model is lost due to incomplete extraction of local features of images by a traditional three-dimensional model reconstruction network is solved.

(2) In a transverse comparison mode, the invention applies a double-branch three-dimensional model prediction network based on a local information weighting mechanism, and the two branches process the input picture in different modes, so that the network can obtain more comprehensive local information, further can better sense some model details which are difficult to predict, and improves the network prediction performance.

Drawings

FIG. 1 is a diagram of a network body architecture for the method of the present invention;

fig. 2 is a diagram of a modified 3D-R2N2 network architecture employed by the backbone network of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person skilled in the art based on the embodiments of the present invention belong to the protection scope of the present invention without creative efforts.

1. The deep neural network is a multi-parameter mapping function for mapping from picture to feature vector, using f_θ(. cndot.) denotes. For a given data set X ═ X₁，x₂，...，x_nAnd its corresponding label set Y ═ Y₁，y₂，...，y_nH, usually with f_i＝f_θ(x_i) To represent the corresponding feature vector of the data.

2. The voxel model describes the position distribution condition of the three-dimensional model in the three-dimensional space, the voxel model divides the three-dimensional space into a plurality of unit voxel spaces with equal volumes, the position of an object in the space is further represented as a set of the unit voxel spaces occupied by the object, and each unit voxel space has corresponding three-dimensional coordinates and occupation information (1 represents occupation and 0 represents unoccupied) of the object. Due to the characteristic of the voxel model of being concise and standard, the voxel model is widely applied to the three-dimensional model reconstruction and generation work.

The invention relates to a three-dimensional model reconstruction method based on a local information weighting mechanism, which comprises the following implementation steps as shown in figure 1:

The improved 3D-R2N2 network structure is used as a backbone network to extract image features of a picture to be predicted, as shown in FIG. 2, the improved 3D-R2N2 network is composed of two parts, namely two-dimensional image feature extraction and three-dimensional model feature extraction, the two-dimensional image feature extraction part comprises 6 convolution modules, and the sizes of the features are reduced in a pooling operation mode. After an image I is input into a network, a plurality of feature maps with different sizes are generated in the two-dimensional image feature extraction process, a feature F with the smallest size and the largest number of times of convolution operation is taken as the final output of a two-dimensional image feature extraction part, the number of channels of the feature F is 512, and the size of the feature F is 1/32 of the size of an input image.

Step three: designing a branch network based on attention mechanism

Step five: combining a branch network with a backbone network

The realization process is as follows:

V＝sigmoid(G+G)

The method is applied to the three-dimensional voxel model reconstruction task of a single picture, utilizes the strong characteristic extraction capability of the neural network and combines a novel characteristic local information extraction method, improves the extraction efficiency of the network model on the local information of the object, improves the prediction precision of the three-dimensional voxel model, and reconstructs some model details which are difficult to predict.

Portions of the invention not described in detail are well within the skill of the art.

The above examples are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.

Claims

1. A three-dimensional model reconstruction method based on a local information weighting mechanism is characterized by comprising the following steps:

the method comprises the following steps: performing feature extraction on an input picture by using a backbone network to obtain two-dimensional image features;

step two: converting the two-dimensional image characteristics obtained in the step one into three-dimensional characteristics;

step three: generating a weight distribution map capable of measuring the importance degree of the two-dimensional characteristic local information by using a branch network based on an attention mechanism;

step four: acting the weight distribution map obtained in the step three on the two-dimensional image features extracted from the backbone network, and further obtaining two-dimensional reinforced features of local information after screening;

step five: and D, performing dimension conversion operation on the two-dimensional reinforced features obtained in the step four to generate three-dimensional reinforced features, and then complexing with a backbone network to generate a final three-dimensional voxel model prediction result.

2. The method for reconstructing a three-dimensional model based on a local information weighting mechanism according to claim 1, wherein: in the first step, the following concrete steps are carried out:

using an improved 3D-R2N2 network as a backbone network to extract image features of a picture to be predicted, wherein the improved 3D-R2N2 network comprises a two-dimensional image feature extraction part and a three-dimensional model feature extraction part, the two-dimensional image feature extraction part comprises 6 convolution modules, and the sizes of the features are reduced in a pooling operation mode; after the image I is input into the improved 3D-R2N2 network, a plurality of feature maps with different sizes are generated in the two-dimensional image feature extraction process, the feature F with the smallest size and the largest number of times of convolution operation is taken as the final output of the two-dimensional image feature extraction part, the number of channels of the feature F is 512, and the size of the feature F is 1/32 of the size of the input image.

3. The method for reconstructing a three-dimensional model based on a local information weighting mechanism according to claim 1, wherein: in the second step, the feature dimension conversion process of converting the two-dimensional image features into the three-dimensional features is as follows:

the feature F obtained in the step one is a two-dimensional feature, when feature dimension conversion is carried out, the F is firstly converted into a one-dimensional feature vector through matrix compression operation, then the one-dimensional feature vector is further subjected to information extraction through two full-link operations, finally the one-dimensional feature is adjusted into a three-dimensional feature through matrix size recombination operation, the converted three-dimensional feature is input into a three-dimensional model feature extraction part of a backbone network, feature extraction is carried out through a 4-layer 3D convolution module, meanwhile, the size of the three-dimensional feature is amplified through anti-pooling operation, and finally the three-dimensional feature with the corresponding size is generated.

4. The method for reconstructing a three-dimensional model based on a local information weighting mechanism according to claim 1, wherein: in the third step, the operation process of the branch network is as follows:

the branch network consists of 8 2D convolution layers and is used for strengthening and screening important local information of two-dimensional features in the main network, the branch network encodes an input image I and generates a matrix M which represents the sensitivity of the local information of each region of the image, wherein the matrix M is M₁，m₂，...，m_kK represents the number of regions, and then M is converted into a weight distribution map a which represents the importance degree of local information of each region of the image, wherein the weight distribution map a is { a }₁，a₂，...，a_kThe implementation process is shown as the following formula:

5. The method for reconstructing a three-dimensional model based on a local information weighting mechanism according to claim 1, wherein: in the fourth step, the local information weighting process is as follows:

applying the weight distribution graph A obtained by the third step on the two-dimensional image feature F in the backbone network_mAnd further obtaining the enhanced feature of the important local information, namely the enhanced feature F', the implementation process is as follows:

F’＝F_m⊙A

in the above formula, F 'is a two-dimensional feature after the local information screening, which represents a dot product, and after the weighting of the weight distribution map a, the local information region contributing significantly to the model reconstruction in F' is enhanced.

6. The method for reconstructing a three-dimensional model based on a local information weighting mechanism according to claim 1, wherein: in the fifth step, the merging process of the trunk network and the branch network is as follows:

converting the reinforced characteristics F ' obtained in the fourth step into three-dimensional reinforced characteristics G ' according to the dimension conversion method in the second step, combining the G ' with the three-dimensional characteristics G corresponding to the size of the extracted part of the three-dimensional model characteristics of the main network, further reinforcing some areas containing more important local information to generate more comprehensive and more accurate three-dimensional characteristics containing the local characteristics, and finally generating a final voxel model V through sigmoid function operation;

the realization process is as follows:

V＝sigmoid(G+G’)