CN114648608A

CN114648608A - Tunnel three-dimensional model reconstruction method based on MVSNET

Info

Publication number: CN114648608A
Application number: CN202210323841.0A
Authority: CN
Inventors: 宋柯; 彭放; 陈勇旭; 万科; 李天智; 邱华
Original assignee: Guoneng Dadu Houziyan Power Generation Co ltd
Current assignee: Guoneng Dadu Houziyan Power Generation Co ltd
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2022-06-21

Abstract

The embodiment of the invention provides a tunnel three-dimensional model reconstruction method based on MVSNET, relating to the technical field of tunnel modeling. The tunnel three-dimensional model reconstruction method based on MVSNET comprises the following steps: acquiring a tunnel image; extracting image features from the tunnel image; carrying out homography cone mapping by combining the extracted image features to construct a cost body; regularizing the obtained cost body to obtain a depth estimation image; the three-dimensional model of the tunnel is obtained by densely reconstructing the depth estimation map, the interior of the tunnel can be well observed through the three-dimensional model, manual inspection can be replaced by one-point inspection, and when abnormal change in the tunnel is observed on the three-dimensional model, inspection personnel are informed to handle the abnormal change at fixed points, so that the effect of reducing inspection cost is achieved.

Description

Tunnel three-dimensional model reconstruction method based on MVSNET

Technical Field

The invention relates to the technical field of tunnel modeling, in particular to a tunnel three-dimensional model reconstruction method based on MVSNET.

Background

The normal operation of the water conservancy junction infrastructure such as diversion tunnels is one of the important factors for guaranteeing the quality of the livelihood and developing the economy. How to strengthen the safety monitoring of the dam and how to form an effective intelligent system of patrol, inspection, diagnosis and maintenance while developing the hydraulic engineering construction at a high speed all become problems to be solved urgently in the development process of the hydraulic engineering.

Therefore, accurate detection of the hydro-junction infrastructure and high visualization of detection results become an extremely important part of an intelligent inspection system, and become major challenges for engineers at present. The traditional tunnel inspection mode mainly depends on manual inspection, so that the time consumption is long, the cost is high, the inspection precision is not high, and the requirement on the detection experience of personnel is very high.

Disclosure of Invention

The invention aims to provide a tunnel three-dimensional model reconstruction method based on MVSNET, which can construct an accurate tunnel three-dimensional model so as to observe the inside of the tunnel and replace manual inspection.

Embodiments of the invention may be implemented as follows:

the invention provides a tunnel three-dimensional model reconstruction method based on MVSNET, which comprises the following steps:

acquiring a tunnel image;

extracting image features from the tunnel image;

carrying out homography cone mapping by combining the extracted image features to construct a cost body;

regularizing the obtained cost body to obtain a depth estimation image;

and carrying out dense reconstruction on the depth estimation map to obtain a three-dimensional model of the tunnel.

In an alternative embodiment, the step of acquiring a tunnel image comprises:

and acquiring tunnel images through an unmanned aerial vehicle or an inspection robot.

In an alternative embodiment, the step of extracting image features from the tunnel image comprises:

and extracting depth map information of the tunnel image by using the convolutional neural network added with the attention mechanism to realize image feature extraction.

In an alternative embodiment, the step of constructing the cost volume by performing homography cone mapping in combination with the extracted image features comprises:

and combining the extracted image features and adopting a camera view cone to carry out homography cone mapping to construct a cost body.

In an alternative embodiment, the camera view frustum comprises a near plane formed by the camera and a far plane, the camera view frustum being formed by the camera, the near plane and the far plane connected.

In an optional embodiment, the tunnel image includes a resource picture and a reference picture, and the step of constructing the cost body by performing homography cone mapping in combination with the extracted image features includes:

obtaining a mapping relation with a reference picture from a resource picture by utilizing homography transformation;

and determining an expression of the cost body by using the cost index based on the variance.

In an alternative embodiment, the expression of the cost body is:

wherein the content of the first and second substances,

is a feature formed based on image features.

In an optional embodiment, the step of regularizing the obtained cost body to obtain a depth estimation map includes:

and regularizing the obtained cost body through multi-scale 3DCNN to obtain a depth estimation image.

In an optional embodiment, the step of regularizing the obtained cost volume by using multi-scale 3DCNN to obtain a depth estimation map includes:

optimizing the obtained cost body to obtain a probability body;

carrying out probability value normalization on the probability body in the depth direction by using softmax operation;

and applying the probability body to depth value prediction to obtain a depth estimation image.

In an optional embodiment, the step of regularizing the obtained cost volume by using multi-scale 3DCNN to obtain a depth estimation map further includes:

and obtaining image boundary information from the tunnel image as a guide to optimize the depth estimation map.

The tunnel three-dimensional model reconstruction method based on MVSNET provided by the embodiment of the invention has the beneficial effects that:

the tunnel three-dimensional model reconstruction method based on the MVSNET comprises the steps of firstly extracting image features from an input image, then combining a plurality of slightly different image features to construct a 3D cost volume to store visual difference information, then carrying out 3D convolution on the three-dimensional features to enable the obtained features to be more orderly, generating an initial depth estimation image, generating a three-dimensional model of the whole tunnel, observing the inside of the tunnel well through the three-dimensional model, replacing manual inspection by one point, and informing an inspector of processing abnormal changes at fixed points when the abnormal changes are observed in the three-dimensional model, so that the inspection cost is reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a flowchart of a tunnel three-dimensional model reconstruction method based on MVSNET according to an embodiment of the present invention;

FIG. 2 is a schematic view of a camera view frustum;

fig. 3 is a schematic structural diagram of multi-scale 3 DCNN.

An icon: 1-camera view frustum; 2-a camera; 3-near plane; 4-far plane.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined or explained in subsequent figures.

In the description of the present invention, it should be noted that if the terms "upper", "lower", "inside", "outside", etc. indicate an orientation or a positional relationship based on that shown in the drawings or that the product of the present invention is used as it is, this is only for convenience of description and simplification of the description, and it does not indicate or imply that the device or the element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention.

Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.

Aiming at the common problem of the application scene of the diversion tunnel and the existing method, the embodiment of the invention adopts a three-dimensional reconstruction method based on deep learning, firstly extracts a depth characteristic diagram from an input image, then combines a plurality of slightly different homography characteristic diagrams to construct a 3D cost volume (three-dimensional cost capacity function) to store visual difference information, and then carries out 3D convolution on the three-dimensional characteristic to enable the obtained characteristic to be more orderly, and generates the most initial depth diagram to generate the three-dimensional model of the whole tunnel.

Referring to fig. 1, the present embodiment provides a tunnel three-dimensional model reconstruction method based on MVSNET, the model reconstruction method mainly uses MVSNET (multiple View Stereo Network, chinese name: Multi-View Stereo Network), the model reconstruction method includes the following steps:

s1: and collecting tunnel images.

Specifically, the tunnel image can be collected through an unmanned aerial vehicle or a patrol robot. The tunnel image comprises a resource picture and a reference picture.

S2: and extracting image features from the tunnel image.

Specifically, a Convolutional Neural Network (CCN) added with an attention mechanism is used for extracting depth map information of the tunnel image, and image features are extracted.

And performing convolution operation on the tunnel image through the CNN to obtain image characteristics, wherein the convolutional neural network comprises a convolutional layer, a BN layer and an activation function. An attention mechanism is added into the convolutional neural network, and self-adaptive feature refinement is carried out on the obtained image features through the attention mechanism, so that the convolutional neural network can extract more useful image features.

The output of the convolutional neural network is a feature map of N32 channels, which is down-sampled by a factor of 4 in each dimension compared to the input original map. While the down-sampling is carried out, neighborhood information of pixels is reserved and is stored in feature descriptors of 32 channels, the feature descriptors provide rich semantic information for feature matching on an original image, and the quality of model reconstruction is remarkably improved.

S3: and carrying out homography cone mapping by combining the extracted image features to construct a cost body (English name: cost volume).

And constructing the cost body based on the image characteristics extracted in the last step and the parameters of the shooting camera. And aiming at the task of depth prediction, constructing a cost body by adopting a reference camera view cone. In other words, the extracted image features are combined with the camera view cone to perform homography cone mapping to construct the cost body.

Referring to fig. 2, the camera view cone 1 includes a near plane 3 and a far plane 4 formed by the camera 2, and the camera view cone 1 is formed by connecting the camera 2, the near plane 3 and the far plane 4.

The following tool method can be specifically adopted for constructing the cost body:

(1) homographic transformation

Specifically, a Homography (English name: Homography) is adopted, and the Homography is used for describing the mapping relation of two planes. In the three-dimensional reconstruction process, a mapping relation with a reference picture needs to be obtained from a resource picture, which needs to use a homographic transformation, for example: the relationship between the common points of the picture P1 and the picture P2 can be described as:

wherein (x)₁,y₁) And (x)₂,y₂) At the same point on the picture P1 and the picture P2, respectivelyAnd the coordinate H is a homography matrix which is a 3 x 3 matrix.

(2) Differentiable homographic transformations

Projecting N feature maps (English names: feature maps) containing image features onto a plurality of planes under a reference picture to form N feature volumes (English names: feature volumes)

The homographic transformation of the plane determines the coordinate transformation from the feature map to the cost volume at depth value d.

Wherein the content of the first and second substances,

representing the projection equation, H_i(d) Denotes F at depth value d_i(i-1, 2, …, N) and F₁Homography matrix of (a). n is₁Expressed as the principal axis direction of the reference camera, the homography is a 3 x 3 matrix, and the differentiable homography is expressed as follows:

the process of homography projection is similar to that of the classical planar scanning algorithm, the only difference being that the sample points are from the feature map

Rather than images

(3) Cost index (English name: Cost Metric)

In obtaining a plurality of features

Then, it is aggregated into a cost body C. To accommodate any number of view inputs, MVSNet uses a variance-based cost metric M that measures the similarity between N views. Using the cost index M, the expression for determining the cost body C is as follows:

wherein

W, H, D and F are the width, depth, sampling number and channel number of the characteristic diagram of the input image respectively.

S4: and regularizing the obtained cost body to obtain a depth estimation image.

Specifically, the obtained cost body is regularized through the multi-scale 3DCNN to obtain a depth estimation image.

The cost body directly calculated from the feature map is likely to contain noise, and the main reason is that the non-lambertian surface has a problem of line-of-sight blocking, and the cost body needs to be regularized to predict the depth map and to perform smoothing.

The cost body regularization step comprises the following steps:

and optimizing the obtained cost body to obtain a probability body. Cost volume regularization is realized by adopting a multi-scale 3DCNN, as shown in FIG. 3, the structure of the multi-scale 3DCNN is similar to that of a 3D edition UNet, and the domain information aggregation is performed in a relatively large receptive field with relatively small storage and calculation cost by adopting a structural mode of a coder-decoder, including 4 scales. To reduce the computational cost of the network, after the first 3D convolutional layer, the 32-channel cost is reduced to 8 channels, and the convolutional layer at each scale is reduced from 3 layers to 2 layers. And finally outputting the cost body with the channel number of 1.

And (4) carrying out probability value normalization on the probability body in the depth direction by using softmax operation, and applying the probability body to depth value prediction to obtain a depth estimation image.

First, an initial depth estimation map is obtained. The expectation is calculated in the depth direction, and a weighted sum of all assumed depth values is used, as shown in the following formula:

where p (d) is the probability value estimated at the depth value d. This can be differentiated and the result of the argmax operation can be obtained. The range of depth values during cost body construction is assumed to be [ d ]_min,d_max]Intra-uniform sampling, so the predicted depth values are continuous. The size of the output depth estimate map is the same as the feature map resulting from the convolution operation.

Second, the depth estimation map is optimized. Problems can exist in recovering depth estimation maps from probability maps due to the fact that the reconstructed depth boundaries are too smooth due to the large receptive field during regularization, and also exist in semantic segmentation and matting. Image boundary information is required to be obtained from a reference picture as a guide so as to optimize the predicted depth estimation map, and the operation is to add a depth residual error learning network at the end of the MVSNet.

Finally, a loss function is defined. The defined loss function takes into account both the loss of the initial depth value and the optimized depth value. The loss of the true depth map and the depth estimation map is used as a loss of training. Considering that the true depth estimation map does not have a value for every pixel point, only those valid pixel points need to be considered. The loss function is thus defined as follows:

s5: and carrying out dense reconstruction on the depth estimation map to obtain a three-dimensional model of the tunnel.

And fusing the depth estimation images obtained by MVSNet for dense reconstruction to obtain the required three-dimensional model.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A tunnel three-dimensional model reconstruction method based on MVSNET is characterized by comprising the following steps:

acquiring a tunnel image;

extracting image features from the tunnel image;

regularizing the obtained cost body to obtain a depth estimation image;

2. The MVSNET-based tunnel three-dimensional model reconstruction method according to claim 1, wherein the step of acquiring a tunnel image comprises:

and acquiring the tunnel image through an unmanned aerial vehicle or a patrol robot.

3. The MVSNET-based tunnel three-dimensional model reconstruction method according to claim 1, wherein the step of extracting image features from the tunnel image comprises:

and extracting the depth map information of the tunnel image by using a convolutional neural network added with an attention mechanism to realize the extraction of the image characteristics.

4. The MVSNET-based tunnel three-dimensional model reconstruction method according to claim 1, wherein the step of constructing a cost body by performing homography cone mapping in combination with the extracted image features comprises:

and combining the extracted image features and a camera view cone to carry out homography cone mapping to construct the cost body.

5. The MVSNET-based tunnel three-dimensional model reconstruction method according to claim 4, wherein the camera view cone comprises a near plane and a far plane formed by a camera, and the camera view cone is formed by connecting the camera, the near plane and the far plane.

6. The MVSNET-based tunnel three-dimensional model reconstruction method according to claim 1, wherein the tunnel image comprises a resource picture and a reference picture, and the step of constructing the cost body by performing homography cone mapping by combining the extracted image features comprises the following steps of:

obtaining a mapping relation with the reference picture from the resource picture by utilizing homography transformation;

7. The MVSNET-based tunnel three-dimensional model reconstruction method according to claim 1, wherein the cost body has an expression as follows:

wherein the content of the first and second substances,

w, H, D, F are the width, depth, sampling number of the input image and the channel number of the characteristic image, N is the number of views, V_iIs a feature formed based on image features.

8. The MVSNET-based tunnel three-dimensional model reconstruction method according to claim 1, wherein the step of regularizing the obtained cost body to obtain a depth estimation map comprises:

9. The MVSNET-based tunnel three-dimensional model reconstruction method according to claim 8, wherein the step of regularizing the obtained cost body by a multi-scale 3DCNN to obtain a depth estimation map comprises:

optimizing the obtained cost body to obtain a probability body;

normalizing the probability value of the probability body in the depth direction by using a softmax operation;

and applying the probability body to depth value prediction to obtain the depth estimation image.

10. The MVSNET-based tunnel three-dimensional model reconstruction method of claim 9, wherein the step of regularizing the obtained cost body by multi-scale 3DCNN to obtain a depth estimation map further comprises:

deriving image boundary information from the tunnel image as a guide to optimize the depth estimation map.