CN113822825A

CN113822825A - Optical building target three-dimensional reconstruction method based on 3D-R2N2

Info

Publication number: CN113822825A
Application number: CN202111409413.1A
Authority: CN
Inventors: 邹倩颖; 郭雪; 蔡雨静; 喻淋
Original assignee: Chengdu College of University of Electronic Science and Technology of China
Current assignee: Chengdu College of University of Electronic Science and Technology of China
Priority date: 2021-11-25
Filing date: 2021-11-25
Publication date: 2021-12-21
Anticipated expiration: 2041-11-25
Also published as: CN113822825B

Abstract

The invention discloses a three-dimensional reconstruction method of an optical building target based on 3D-R2N2, which relates to the technical field of three-dimensional reconstruction, and comprises the steps of obtaining an optical image and preprocessing the optical image; constructing a 3D-R2N2 network, and inputting the preprocessed optical image into the constructed 3D-R2N2 network; wherein the 3D-R2N2 network includes a CNN module; carrying out feature extraction and coding on the optical image, and processing the optical image into a low-dimensional feature vector; sending the low-dimensional feature vector into a 3D-LSTM unit to obtain a three-dimensional grid structure; wherein the three-dimensional grid structure comprises voxels; inputting the three-dimensional grid structure into a decoder, and converting the voxel into a three-dimensional probability matrix; and performing pixel reconstruction through the three-dimensional probability matrix, namely completing the three-dimensional reconstruction of the optical building target. The method can stabilize the training model, reduce the convergence, improve the accuracy of the reconstructed model, recover more accurate images and enable the images to achieve good visual effect.

Description

Optical building target three-dimensional reconstruction method based on 3D-R2N2

Technical Field

The invention relates to the technical field of three-dimensional reconstruction, in particular to a 3D-R2N 2-based optical building target three-dimensional reconstruction method.

Background

Three-dimensional reconstruction refers to the creation of a mathematical model of a target object that is suitable for computer representation and processing. With the development of modern science and technology, building three-dimensional reconstruction technology attracts much attention, the construction of three-dimensional models becomes one of the key elements of urban geospatial data frames, and how to quickly, automatically and accurately construct three-dimensional models of urban areas, especially various buildings with complex shapes, is a hot problem in research in various fields at present. In 2015, three-dimensional reconstruction networks 3D renderets based on voxel representation were first proposed, but the networks have matching problems of texture defects, specular reflection, baselines and the like. In 2016, a 3D-R2N2 method is proposed, which mainly solves the problem of object feature matching, but the reconstruction accuracy and efficiency of the method are not high; providing a WarpNet network framework based on a convolutional neural network, realizing reconstruction with quality similar to that of a supervision method, and reconstructing target distortion by using the method; the MarrNet model for performing end-to-end training on a real image has the problems of complex calculation, lack of finer geometric shapes and the like. In 2018, images containing complex objects are reconstructed in three dimensions by using a voxel-level reconstruction algorithm, but for images with low resolution, the reconstruction accuracy of the method is low. In 2017, a B-Rep algorithm is adopted for three-dimensional reconstruction, and the algorithm is a polyhedron-oriented three-dimensional reconstruction algorithm and is only suitable for simple polyhedrons. The problems of low modeling efficiency, poor model visual effect, low modeling precision of texture missing areas and the like exist in the traditional three-dimensional reconstruction, so that higher requirements are put forward on a reconstruction algorithm.

Disclosure of Invention

Aiming at the defects in the prior art, the optical building target three-dimensional reconstruction method based on 3D-R2N2 solves the problems of low modeling efficiency, poor model visual effect and low modeling precision of texture missing areas in the prior art.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that:

the method for three-dimensional reconstruction of the optical building target based on the 3D-R2N2 is provided, and comprises the following steps:

s1, acquiring an optical image and preprocessing the optical image;

s2, constructing a 3D-R2N2 network, and inputting the preprocessed optical image into the constructed 3D-R2N2 network; the 3D-R2N2 network comprises an image extraction module, a pyramid pooling layer, a CNN module and a 3D-LSTM unit which are connected in sequence;

s3, adjusting the size of the input optical image to be a uniform size through the pyramid pooling layer;

s4, extracting the features of the optical image with uniform size by using a CNN module and depth residual variation of a 3D-R2N2 network, and encoding the extracted features;

s5, performing one-dimensional convolution on the coded features, and compressing the features into 1024-dimensional feature vectors, namely low-dimensional feature vectors, through a coder;

s6, sending the low-dimensional feature vector into a 3D-LSTM unit to obtain a three-dimensional grid structure; wherein the three-dimensional grid structure comprises voxels;

s7, inputting the three-dimensional grid structure into a decoder, and improving the hidden state resolution of the three-dimensional grid structure through the decoder until the target output resolution is reached;

s8, converting the three-dimensional grid structure reaching the target output resolution into the existence probability of the voxel at the voxel coordinate point by using a cross entropy loss function, and processing the probability into a Bernoulli distribution form;

s9, establishing the existence probability of the voxels in the Bernoulli distribution form in the voxel coordinate points into a three-dimensional probability matrix;

and S10, performing pixel reconstruction through the three-dimensional probability matrix, namely completing the three-dimensional reconstruction of the optical building target.

Further, the specific method of preprocessing in step S1 is:

s1-1, according to the formula:

obtaining the minimum total variation

(ii) a Wherein

Is a pair of

The differential is obtained by the differential analysis,

is a domain of definition of a pixel point,

for the original sharp analog noise high frequency image,

for simulating pixels in noisy high-frequency images with original definitionxThe coordinates of the position of the object to be imaged,

for simulating pixels in noisy high-frequency images with original definitionyCoordinates of the two components,

is composed of

The differential operator of (a), i.e. the optical image,

as coordinates to pixel pointsxThe derivation is carried out by the derivation,

as coordinates to pixel pointsyDerivation is carried out;

s1-2, carrying out noise reduction processing on the optical image by utilizing the minimized total variation;

s1-3, processing the optical image after noise reduction into a vertical image, and graying the vertical image to obtain a gray image;

s1-4, extending all the area spaces with concentrated gray scales in the gray scale image to all the gray scale area space ranges to obtain a non-uniform extension and stretching gray scale image;

and S1-5, redistributing the pixel values of the non-uniform extension stretched gray-scale image to finish preprocessing.

Further, in step S2, the CNN module includes 12 convolutional layers, 5 residual connection layers, 4 bottleneck layers, and 1 transition Layer, where the residual connection layers include a first convolutional Layer, a first leak _ Relu activation function Layer, a second leak _ Relu activation function Layer, a third leak _ Relu activation function Layer, and a PC Layer path control Layer, the first bottleneck Layer to the fourth bottleneck Layer each include a BN normalization Layer and a ReLU activation function Layer, and the first convolutional Layer has a 7 × 7 structure, and the second convolutional Layer to the thirteenth convolutional Layer each have a 3 × 3 structure;

the first convolution Layer, the first Leaky _ Relu activation function Layer, the second convolution Layer, the third convolution Layer, the fourth convolution Layer, the fifth convolution Layer, the sixth convolution Layer, the seventh convolution Layer, the eighth convolution Layer, the second Leaky _ Relu activation function Layer, the ninth convolution Layer, the first bottleneck Layer, the tenth convolution Layer, the second bottleneck Layer, the eleventh convolution Layer, the third bottleneck Layer, the twelfth convolution Layer, the fourth bottleneck Layer, the thirteenth convolution Layer, the transition Layer, the third Leaky _ Relu activation function Layer and the PC Layer access control Layer are connected in sequence; wherein the first convolution Layer is an input Layer, and the PC Layer channel control Layer is an output Layer;

the output end of the second convolution layer is respectively connected with the input end of the third convolution layer, the input end of the fourth convolution layer, the input end of the fifth convolution layer, the input end of the sixth convolution layer and the input end of the seventh convolution layer; the output end of the third convolution layer is respectively connected with the input end of the fourth convolution layer, the input end of the fifth convolution layer, the input end of the sixth convolution layer and the input end of the seventh convolution layer; the output end of the fourth convolution layer is respectively connected with the input end of the fifth convolution layer, the input end of the sixth convolution layer and the input end of the seventh convolution layer; the output end of the fifth convolution layer is respectively connected with the input end of the sixth convolution layer and the input end of the seventh convolution layer; the output end of the sixth convolution layer is connected with the input end of the seventh convolution layer;

the output end of the ninth convolution layer is respectively connected with the input end of the first bottleneck layer, the input end of the second bottleneck layer, the input end of the third bottleneck layer, the input end of the fourth bottleneck layer and the input end of the transition layer; the output end of the tenth convolution layer is respectively connected with the input end of the second bottleneck layer, the input end of the third bottleneck layer, the input end of the fourth bottleneck layer and the input end of the transition layer; the eleventh convolution layer is respectively connected with the input end of the third bottleneck layer, the input end of the fourth bottleneck layer and the input end of the transition layer; the output end of the twelfth convolution layer is respectively connected with the input end of the fourth bottleneck layer and the input end of the transition layer; the output end of the thirteenth convolution layer is connected to the input end of the transition layer.

Further, the 3D-LSTM unit in step S2 includes several basic modules; the basic module comprises a first current full-connection layer, and the output end of the first current full-connection layer is respectively connected with the input end of the first full-connection layer, the input end of the second full-connection layer and the input end of the third full-connection layer; the output end of the first full connection layer is connected with the output end of the first 3 multiplied by 3 convolution layer and is connected with the input end of the first adder; the output end of the second full connection layer is connected with the output end of the second 3 multiplied by 3 convolution layer and is connected with the input end of the second adder; the output end of the third full connection layer is connected with the output end of the third 3 multiplied by 3 convolutional layer and is connected with the input end of the third adder; the output end of the second adder and the output end of the third adder are respectively connected with the input end of the first multiplier, and the output end of the first multiplier is connected with the input end of the fourth adder; the input end of the first 3 × 3 × 3 convolutional layer, the input end of the second 3 × 3 × 3 convolutional layer and the input end of the third 3 × 3 × 3 convolutional layer are respectively connected with the output end of the hidden layer at the first previous moment; the output end of the first adder is respectively connected with the input end of the second multiplier and the storage unit at the previous moment; the output end of the second multiplier is connected with the output end of the fourth adder; the output end of the fourth adder is respectively connected with the hidden layer at the current moment and the storage unit at the first current moment.

Further, the pyramid pooling layers in step S2 include a feature map 16-block division layer, a feature map 4-block division layer, and a feature map 1-block division layer.

Further, the decoder in step S7 includes several sets of two-dimensional vector matrices, a pooling layer, a one-dimensional vector matrix, a full-link layer, and a Softmax activation layer.

Further, the formula of the cross entropy loss function in step S8 is:

wherein the content of the first and second substances,

in order to be a function of the cross-entropy loss,

is a coordinate of a pixel, and is,

in the form of a real voxel point,

in the form of a voxel probability that is,

is a logarithmic function with a base 10.

The invention has the beneficial effects that:

1. designing a CNN module with dense links, wherein each convolution layer of the CNN module is connected with a subsequent convolution layer, so that each convolution layer can increase the feature mapping of the corresponding layer, the subsequent convolution layer can acquire the information of the previous convolution layer, and the information of the first convolution layer can be acquired even the last convolution layer, thereby fully utilizing the number of channels and completing information transmission;

2. a bottleneck layer structure is added between convolution layers of the CNN module which is densely linked, so that the dimensionality of a convolution result can be reduced, the dimensionality increase caused by acquiring a large amount of characteristic information by each convolution layer can be relieved, the dimensionality is reduced, the training is stable, and the coding efficiency is improved;

3. a pyramid pooling layer is added between the extraction module and the CNN module, so that the size of the image can be unified, the image classification effect is improved by fusing multi-scale features, and the target recognition rate is improved;

4. the algorithm can improve the characteristic extraction effect, so that the details of an image reconstruction model are more perfect, and the accuracy is higher;

5. compared with the traditional algorithm, the method can shorten the registration time, reduce the convergence of the registration result and reduce the complexity of the algorithm.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a block diagram of a CNN module according to the present invention;

FIG. 3 is a diagram of a 3D-LSTM unit structure;

FIG. 4 is a block diagram of a decoder;

FIG. 5 is an optical image;

FIG. 6 is a modeling diagram of the B-Rep algorithm;

FIG. 7 is a modeling diagram of a prior 3D-R2N2 algorithm;

FIG. 8 is a modeling diagram of the 3D-R2N2 algorithm of the present invention;

FIG. 9 is a graph of the reconstruction result of the 3D-R2N2 algorithm of the present invention;

fig. 10 is a reconstruction result diagram of the voxel reconstruction algorithm.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

As shown in FIG. 1, the optical building target three-dimensional reconstruction method based on 3D-R2N2 comprises the following steps:

s1, acquiring an optical image and preprocessing the optical image;

The specific method of preprocessing in step S1 is:

s1-1, according to the formula:

obtaining the minimum total variation

(ii) a Wherein

Is a pair of

The differential is obtained by the differential analysis,

is a domain of definition of a pixel point,

for the original sharp analog noise high frequency image,

is composed of

The differential operator of (a), i.e. the optical image,

as coordinates to pixel pointsxThe derivation is carried out by the derivation,

as coordinates to pixel pointsyDerivation is carried out;

The sum of the frequencies of the accumulated items of the histogram is

：

Wherein the content of the first and second substances,

is in the order of gray scaleiThe number of pixels of (a) is,nis the total number of the number of pixels,Lis as follows;

according to the formula:

to pair

Carrying out rounding; where int is the rounding function.

As shown in fig. 2, the CNN module in step S2 includes 12 convolutional layers, 5 residual connection layers, 4 bottleneck layers, and 1 transition Layer, where the residual connection layers include a first convolutional Layer, a first leak _ Relu activation function Layer, a second leak _ Relu activation function Layer, a third leak _ Relu activation function Layer, and a PC Layer path control Layer, the first bottleneck Layer to the fourth bottleneck Layer each include one BN normalization Layer and one ReLU activation function Layer, and the first convolutional Layer has a 7 × 7 structure, and the second convolutional Layer to the thirteenth convolutional Layer each have a 3 × 3 structure;

As shown in fig. 3, the 3D-LSTM unit in step S2 includes several basic modules; the basic module comprises a first current full-connection layer, and the output end of the first current full-connection layer is respectively connected with the input end of the first full-connection layer, the input end of the second full-connection layer and the input end of the third full-connection layer; the output end of the first full connection layer is connected with the output end of the first 3 multiplied by 3 convolution layer and is connected with the input end of the first adder; the output end of the second full connection layer is connected with the output end of the second 3 multiplied by 3 convolution layer and is connected with the input end of the second adder; the output end of the third full connection layer is connected with the output end of the third 3 multiplied by 3 convolutional layer and is connected with the input end of the third adder; the output end of the second adder and the output end of the third adder are respectively connected with the input end of the first multiplier, and the output end of the first multiplier is connected with the input end of the fourth adder; the input end of the first 3 × 3 × 3 convolutional layer, the input end of the second 3 × 3 × 3 convolutional layer and the input end of the third 3 × 3 × 3 convolutional layer are respectively connected with the output end of the hidden layer at the first previous moment; the output end of the first adder is respectively connected with the input end of the second multiplier and the storage unit at the previous moment; the output end of the second multiplier is connected with the output end of the fourth adder; the output end of the fourth adder is respectively connected with the hidden layer at the current moment and the storage unit at the first current moment.

The grid equation governing the 3D-LSTM cell is:

wherein the content of the first and second substances,

is composed oftThe output gate of the time of day,

in order to be a sigmoid function,

is a weight matrix of the output gates,

in order to be an input function of the input function,

is composed oftTime of dayThe input of (a) is performed,

the matrix of transitions is hidden for the output gates,

in order to perform the convolution operation,

is composed oft-a hidden state at time 1,

is the output gate offset;

in order to input the information into the gate,

is a weight matrix for the input gate,

the state matrix is hidden for the input gate,

is the offset of the input gate;

in order to multiply the elements of the image,

is composed oft-a storage unit at time 1,

in order to activate the function(s),

is a weight matrix of the storage unit,

the state matrix is hidden for the storage unit,

in order to store the cell offset,

is a hidden unit.

The pyramid pooling layers in step S2 include a feature map 16-block division layer, a feature map 4-block division layer, and a feature map 1-block division layer.

As shown in fig. 4, the decoder in step S7 includes several sets of two-dimensional vector matrices, a pooling layer, a one-dimensional vector matrix, a full-link layer, and a Softmax activation layer.

The formula of the cross entropy loss function in step S8 is:

wherein the content of the first and second substances,

in order to be a function of the cross-entropy loss,

is a coordinate of a pixel, and is,

in the form of a real voxel point,

in the form of a voxel probability that is,

is a logarithmic function with a base 10.

In one embodiment of the invention, the cross-overlapping rate of the three-dimensional reconstruction result output by the network and the real model is evaluated by adopting IoU value, namely the evaluation criterion of the three-dimensional reconstruction precision.

Wherein the content of the first and second substances,

in order to predict the value of the target,

the representation takes the intersection set,

in order to be the true value of the value,

the union set is represented as a union set,

for the purpose of the index function,

is the first to predict valueiThe number of the individual voxels,

is a true value ofiAnd (4) each voxel.

As shown in fig. 5, fig. 6, fig. 7, fig. 8 and table 1, it can be seen that the details of the reconstruction model of the method of the present invention are more complete, i.e., the accuracy is higher; compared with other two algorithms, the IoU value of the 3D-R2N2 algorithm is higher, so that the 3D-R2N2 algorithm has a better effect on a three-position reconstruction model, the reconstruction accuracy of the 3D-R2N2 algorithm is 7.8% higher than that of a B-Rep algorithm, and the accuracy of the 3D-R2N2 algorithm is 5.3% higher than that of a 3D-R2N2 algorithm.

TABLE 1

In another embodiment of the invention, a teaching building is used as an object of the embodiment to detect the three-dimensional building target reconstruction performance of the 3D-R2N2 algorithm. The three-dimensional reconstruction of the image is performed in sequence by using the 3D-R2N2 algorithm and the voxel level reconstruction algorithm, and the obtained results are shown in FIGS. 9 and 10. Meanwhile, the registration time of the optical image data and the convergence of the registration result are simulated in sequence, the comparison result is shown in table 2, the registration time obtained by the two algorithms is in a direct proportion relation with the data scale, the data scale is increased, the registration time is increased, but the registration time of the voxel reconstruction algorithm to the image data each time is higher than that of the 3D-R2N2 algorithm, and the result analysis shows that the 3D-R2N2 algorithm can improve the speed by 3.2% when the three-dimensional reconstruction of the image is carried out compared with the voxel reconstruction algorithm. And then, the convergence of the registration result of the image data is analyzed by the two algorithms, the convergence performance of the 3D-R2N2 algorithm is stronger than that of a voxel reconstruction algorithm each time, and the 3D-R2N2 algorithm can effectively reduce the algorithm complexity by 15.1% compared with the voxel reconstruction algorithm when the three-dimensional image reconstruction is carried out.

TABLE 2

The invention designs a CNN module with compact links, each convolution layer of the CNN module is connected with the subsequent convolution layer, so that each convolution layer can increase the feature mapping of the corresponding layer, the subsequent convolution layer can acquire the information of the previous convolution layer, and the information of the first convolution layer can be acquired even the last convolution layer, thereby fully utilizing the number of channels and completing the information transmission; a bottleneck layer structure is added between convolution layers of the CNN module which is densely linked, so that the dimensionality of a convolution result can be reduced, the dimensionality increase caused by acquiring a large amount of characteristic information by each convolution layer can be relieved, the dimensionality is reduced, the training is stable, and the coding efficiency is improved; a pyramid pooling layer is added between the extraction module and the CNN module, so that the size of the image can be unified, the image classification effect is improved by fusing multi-scale features, and the target recognition rate is improved; the algorithm can improve the characteristic extraction effect, so that the details of an image reconstruction model are more perfect, and the accuracy is higher; compared with the traditional algorithm, the method can shorten the registration time, reduce the convergence of the registration result and reduce the complexity of the algorithm.

Claims

1. A three-dimensional reconstruction method of an optical building target based on 3D-R2N2 is characterized by comprising the following steps:

s1, acquiring an optical image and preprocessing the optical image;

2. The 3D-R2N 2-based optical building target three-dimensional reconstruction method according to claim 1, wherein the preprocessing in step S1 comprises:

s1-1, according to the formula:

obtaining the minimum total variation

(ii) a Wherein

Is a pair of

The differential is obtained by the differential analysis,

is a domain of definition of a pixel point,

for the original sharp analog noise high frequency image,

is composed of

The differential operator of (a), i.e. the optical image,

as coordinates to pixel pointsxThe derivation is carried out by the derivation,

as coordinates to pixel pointsyDerivation is carried out;

3. The 3D-R2N 2-based optical building target three-dimensional reconstruction method according to claim 1, wherein the CNN module in step S2 includes 12 convolutional layers, 5 residual connection layers, 4 bottleneck layers and 1 transition Layer, wherein the residual connection layers include a first convolutional Layer, a first leak _ Relu activation function Layer, a second leak _ Relu activation function Layer, a third leak _ Relu activation function Layer and a PC Layer channel control Layer, the first bottleneck Layer to the fourth bottleneck Layer each include a BN normalization Layer and a Relu activation function Layer, and the first convolutional Layer is 7 × 7 structure, and the second convolutional Layer to the thirteenth convolutional Layer are 3 × 3 structure;

4. The method for three-dimensional reconstruction of an optical building object based on 3D-R2N2, wherein the 3D-LSTM unit comprises several basic modules in step S2; the basic module comprises a first current full-connection layer, and the output end of the first current full-connection layer is respectively connected with the input end of the first full-connection layer, the input end of the second full-connection layer and the input end of the third full-connection layer; the output end of the first full connection layer is connected with the output end of the first 3 multiplied by 3 convolution layer and is connected with the input end of the first adder; the output end of the second full connection layer is connected with the output end of the second 3 multiplied by 3 convolution layer and is connected with the input end of the second adder; the output end of the third full connection layer is connected with the output end of the third 3 multiplied by 3 convolutional layer and is connected with the input end of the third adder; the output end of the second adder and the output end of the third adder are respectively connected with the input end of the first multiplier, and the output end of the first multiplier is connected with the input end of the fourth adder; the input end of the first 3 × 3 × 3 convolutional layer, the input end of the second 3 × 3 × 3 convolutional layer and the input end of the third 3 × 3 × 3 convolutional layer are respectively connected with the output end of the hidden layer at the first previous moment; the output end of the first adder is respectively connected with the input end of the second multiplier and the storage unit at the previous moment; the output end of the second multiplier is connected with the output end of the fourth adder; the output end of the fourth adder is respectively connected with the hidden layer at the current moment and the storage unit at the first current moment.

5. The 3D-R2N 2-based optical building object three-dimensional reconstruction method according to claim 1, wherein the pyramid pooling layers in step S2 includes a feature mapping 16-block partition layer, a feature mapping 4-block partition layer and a feature mapping 1-block partition layer.

6. The 3D-R2N 2-based optical building target three-dimensional reconstruction method according to claim 1, wherein the decoder in step S7 includes several sets of two-dimensional vector matrix, pooling layer, one-dimensional vector matrix, full-connected layer and Softmax active layer.

7. The method for three-dimensional reconstruction of an optical building object based on 3D-R2N2, wherein the formula of the cross entropy loss function in step S8 is as follows:

wherein the content of the first and second substances,

in order to be a function of the cross-entropy loss,

is a coordinate of a pixel, and is,

in the form of a real voxel point,

in the form of a voxel probability that is,

is a logarithmic function with a base 10.