CN114648608A - Tunnel three-dimensional model reconstruction method based on MVSNET - Google Patents
Tunnel three-dimensional model reconstruction method based on MVSNET Download PDFInfo
- Publication number
- CN114648608A CN114648608A CN202210323841.0A CN202210323841A CN114648608A CN 114648608 A CN114648608 A CN 114648608A CN 202210323841 A CN202210323841 A CN 202210323841A CN 114648608 A CN114648608 A CN 114648608A
- Authority
- CN
- China
- Prior art keywords
- tunnel
- image
- dimensional model
- mvsnet
- reconstruction method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention provides a tunnel three-dimensional model reconstruction method based on MVSNET, relating to the technical field of tunnel modeling. The tunnel three-dimensional model reconstruction method based on MVSNET comprises the following steps: acquiring a tunnel image; extracting image features from the tunnel image; carrying out homography cone mapping by combining the extracted image features to construct a cost body; regularizing the obtained cost body to obtain a depth estimation image; the three-dimensional model of the tunnel is obtained by densely reconstructing the depth estimation map, the interior of the tunnel can be well observed through the three-dimensional model, manual inspection can be replaced by one-point inspection, and when abnormal change in the tunnel is observed on the three-dimensional model, inspection personnel are informed to handle the abnormal change at fixed points, so that the effect of reducing inspection cost is achieved.
Description
Technical Field
The invention relates to the technical field of tunnel modeling, in particular to a tunnel three-dimensional model reconstruction method based on MVSNET.
Background
The normal operation of the water conservancy junction infrastructure such as diversion tunnels is one of the important factors for guaranteeing the quality of the livelihood and developing the economy. How to strengthen the safety monitoring of the dam and how to form an effective intelligent system of patrol, inspection, diagnosis and maintenance while developing the hydraulic engineering construction at a high speed all become problems to be solved urgently in the development process of the hydraulic engineering.
Therefore, accurate detection of the hydro-junction infrastructure and high visualization of detection results become an extremely important part of an intelligent inspection system, and become major challenges for engineers at present. The traditional tunnel inspection mode mainly depends on manual inspection, so that the time consumption is long, the cost is high, the inspection precision is not high, and the requirement on the detection experience of personnel is very high.
Disclosure of Invention
The invention aims to provide a tunnel three-dimensional model reconstruction method based on MVSNET, which can construct an accurate tunnel three-dimensional model so as to observe the inside of the tunnel and replace manual inspection.
Embodiments of the invention may be implemented as follows:
the invention provides a tunnel three-dimensional model reconstruction method based on MVSNET, which comprises the following steps:
acquiring a tunnel image;
extracting image features from the tunnel image;
carrying out homography cone mapping by combining the extracted image features to construct a cost body;
regularizing the obtained cost body to obtain a depth estimation image;
and carrying out dense reconstruction on the depth estimation map to obtain a three-dimensional model of the tunnel.
In an alternative embodiment, the step of acquiring a tunnel image comprises:
and acquiring tunnel images through an unmanned aerial vehicle or an inspection robot.
In an alternative embodiment, the step of extracting image features from the tunnel image comprises:
and extracting depth map information of the tunnel image by using the convolutional neural network added with the attention mechanism to realize image feature extraction.
In an alternative embodiment, the step of constructing the cost volume by performing homography cone mapping in combination with the extracted image features comprises:
and combining the extracted image features and adopting a camera view cone to carry out homography cone mapping to construct a cost body.
In an alternative embodiment, the camera view frustum comprises a near plane formed by the camera and a far plane, the camera view frustum being formed by the camera, the near plane and the far plane connected.
In an optional embodiment, the tunnel image includes a resource picture and a reference picture, and the step of constructing the cost body by performing homography cone mapping in combination with the extracted image features includes:
obtaining a mapping relation with a reference picture from a resource picture by utilizing homography transformation;
and determining an expression of the cost body by using the cost index based on the variance.
In an alternative embodiment, the expression of the cost body is:
In an optional embodiment, the step of regularizing the obtained cost body to obtain a depth estimation map includes:
and regularizing the obtained cost body through multi-scale 3DCNN to obtain a depth estimation image.
In an optional embodiment, the step of regularizing the obtained cost volume by using multi-scale 3DCNN to obtain a depth estimation map includes:
optimizing the obtained cost body to obtain a probability body;
carrying out probability value normalization on the probability body in the depth direction by using softmax operation;
and applying the probability body to depth value prediction to obtain a depth estimation image.
In an optional embodiment, the step of regularizing the obtained cost volume by using multi-scale 3DCNN to obtain a depth estimation map further includes:
and obtaining image boundary information from the tunnel image as a guide to optimize the depth estimation map.
The tunnel three-dimensional model reconstruction method based on MVSNET provided by the embodiment of the invention has the beneficial effects that:
the tunnel three-dimensional model reconstruction method based on the MVSNET comprises the steps of firstly extracting image features from an input image, then combining a plurality of slightly different image features to construct a 3D cost volume to store visual difference information, then carrying out 3D convolution on the three-dimensional features to enable the obtained features to be more orderly, generating an initial depth estimation image, generating a three-dimensional model of the whole tunnel, observing the inside of the tunnel well through the three-dimensional model, replacing manual inspection by one point, and informing an inspector of processing abnormal changes at fixed points when the abnormal changes are observed in the three-dimensional model, so that the inspection cost is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flowchart of a tunnel three-dimensional model reconstruction method based on MVSNET according to an embodiment of the present invention;
FIG. 2 is a schematic view of a camera view frustum;
fig. 3 is a schematic structural diagram of multi-scale 3 DCNN.
An icon: 1-camera view frustum; 2-a camera; 3-near plane; 4-far plane.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined or explained in subsequent figures.
In the description of the present invention, it should be noted that if the terms "upper", "lower", "inside", "outside", etc. indicate an orientation or a positional relationship based on that shown in the drawings or that the product of the present invention is used as it is, this is only for convenience of description and simplification of the description, and it does not indicate or imply that the device or the element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention.
Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.
Aiming at the common problem of the application scene of the diversion tunnel and the existing method, the embodiment of the invention adopts a three-dimensional reconstruction method based on deep learning, firstly extracts a depth characteristic diagram from an input image, then combines a plurality of slightly different homography characteristic diagrams to construct a 3D cost volume (three-dimensional cost capacity function) to store visual difference information, and then carries out 3D convolution on the three-dimensional characteristic to enable the obtained characteristic to be more orderly, and generates the most initial depth diagram to generate the three-dimensional model of the whole tunnel.
Referring to fig. 1, the present embodiment provides a tunnel three-dimensional model reconstruction method based on MVSNET, the model reconstruction method mainly uses MVSNET (multiple View Stereo Network, chinese name: Multi-View Stereo Network), the model reconstruction method includes the following steps:
s1: and collecting tunnel images.
Specifically, the tunnel image can be collected through an unmanned aerial vehicle or a patrol robot. The tunnel image comprises a resource picture and a reference picture.
S2: and extracting image features from the tunnel image.
Specifically, a Convolutional Neural Network (CCN) added with an attention mechanism is used for extracting depth map information of the tunnel image, and image features are extracted.
And performing convolution operation on the tunnel image through the CNN to obtain image characteristics, wherein the convolutional neural network comprises a convolutional layer, a BN layer and an activation function. An attention mechanism is added into the convolutional neural network, and self-adaptive feature refinement is carried out on the obtained image features through the attention mechanism, so that the convolutional neural network can extract more useful image features.
The output of the convolutional neural network is a feature map of N32 channels, which is down-sampled by a factor of 4 in each dimension compared to the input original map. While the down-sampling is carried out, neighborhood information of pixels is reserved and is stored in feature descriptors of 32 channels, the feature descriptors provide rich semantic information for feature matching on an original image, and the quality of model reconstruction is remarkably improved.
S3: and carrying out homography cone mapping by combining the extracted image features to construct a cost body (English name: cost volume).
And constructing the cost body based on the image characteristics extracted in the last step and the parameters of the shooting camera. And aiming at the task of depth prediction, constructing a cost body by adopting a reference camera view cone. In other words, the extracted image features are combined with the camera view cone to perform homography cone mapping to construct the cost body.
Referring to fig. 2, the camera view cone 1 includes a near plane 3 and a far plane 4 formed by the camera 2, and the camera view cone 1 is formed by connecting the camera 2, the near plane 3 and the far plane 4.
The following tool method can be specifically adopted for constructing the cost body:
(1) homographic transformation
Specifically, a Homography (English name: Homography) is adopted, and the Homography is used for describing the mapping relation of two planes. In the three-dimensional reconstruction process, a mapping relation with a reference picture needs to be obtained from a resource picture, which needs to use a homographic transformation, for example: the relationship between the common points of the picture P1 and the picture P2 can be described as:
wherein (x)1,y1) And (x)2,y2) At the same point on the picture P1 and the picture P2, respectivelyAnd the coordinate H is a homography matrix which is a 3 x 3 matrix.
(2) Differentiable homographic transformations
Projecting N feature maps (English names: feature maps) containing image features onto a plurality of planes under a reference picture to form N feature volumes (English names: feature volumes)The homographic transformation of the plane determines the coordinate transformation from the feature map to the cost volume at depth value d. Wherein the content of the first and second substances,representing the projection equation, Hi(d) Denotes F at depth value di(i-1, 2, …, N) and F1Homography matrix of (a). n is1Expressed as the principal axis direction of the reference camera, the homography is a 3 x 3 matrix, and the differentiable homography is expressed as follows:
the process of homography projection is similar to that of the classical planar scanning algorithm, the only difference being that the sample points are from the feature mapRather than images
(3) Cost index (English name: Cost Metric)
In obtaining a plurality of featuresThen, it is aggregated into a cost body C. To accommodate any number of view inputs, MVSNet uses a variance-based cost metric M that measures the similarity between N views. Using the cost index M, the expression for determining the cost body C is as follows:
whereinW, H, D and F are the width, depth, sampling number and channel number of the characteristic diagram of the input image respectively.
S4: and regularizing the obtained cost body to obtain a depth estimation image.
Specifically, the obtained cost body is regularized through the multi-scale 3DCNN to obtain a depth estimation image.
The cost body directly calculated from the feature map is likely to contain noise, and the main reason is that the non-lambertian surface has a problem of line-of-sight blocking, and the cost body needs to be regularized to predict the depth map and to perform smoothing.
The cost body regularization step comprises the following steps:
and optimizing the obtained cost body to obtain a probability body. Cost volume regularization is realized by adopting a multi-scale 3DCNN, as shown in FIG. 3, the structure of the multi-scale 3DCNN is similar to that of a 3D edition UNet, and the domain information aggregation is performed in a relatively large receptive field with relatively small storage and calculation cost by adopting a structural mode of a coder-decoder, including 4 scales. To reduce the computational cost of the network, after the first 3D convolutional layer, the 32-channel cost is reduced to 8 channels, and the convolutional layer at each scale is reduced from 3 layers to 2 layers. And finally outputting the cost body with the channel number of 1.
And (4) carrying out probability value normalization on the probability body in the depth direction by using softmax operation, and applying the probability body to depth value prediction to obtain a depth estimation image.
First, an initial depth estimation map is obtained. The expectation is calculated in the depth direction, and a weighted sum of all assumed depth values is used, as shown in the following formula:
where p (d) is the probability value estimated at the depth value d. This can be differentiated and the result of the argmax operation can be obtained. The range of depth values during cost body construction is assumed to be [ d ]min,dmax]Intra-uniform sampling, so the predicted depth values are continuous. The size of the output depth estimate map is the same as the feature map resulting from the convolution operation.
Second, the depth estimation map is optimized. Problems can exist in recovering depth estimation maps from probability maps due to the fact that the reconstructed depth boundaries are too smooth due to the large receptive field during regularization, and also exist in semantic segmentation and matting. Image boundary information is required to be obtained from a reference picture as a guide so as to optimize the predicted depth estimation map, and the operation is to add a depth residual error learning network at the end of the MVSNet.
Finally, a loss function is defined. The defined loss function takes into account both the loss of the initial depth value and the optimized depth value. The loss of the true depth map and the depth estimation map is used as a loss of training. Considering that the true depth estimation map does not have a value for every pixel point, only those valid pixel points need to be considered. The loss function is thus defined as follows:
s5: and carrying out dense reconstruction on the depth estimation map to obtain a three-dimensional model of the tunnel.
And fusing the depth estimation images obtained by MVSNet for dense reconstruction to obtain the required three-dimensional model.
The tunnel three-dimensional model reconstruction method based on MVSNET provided by the embodiment of the invention has the beneficial effects that:
the tunnel three-dimensional model reconstruction method based on the MVSNET comprises the steps of firstly extracting image features from an input image, then combining a plurality of slightly different image features to construct a 3D cost volume to store visual difference information, then carrying out 3D convolution on the three-dimensional features to enable the obtained features to be more orderly, generating an initial depth estimation image, generating a three-dimensional model of the whole tunnel, observing the inside of the tunnel well through the three-dimensional model, replacing manual inspection by one point, and informing an inspector of processing abnormal changes at fixed points when the abnormal changes are observed in the three-dimensional model, so that the inspection cost is reduced.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A tunnel three-dimensional model reconstruction method based on MVSNET is characterized by comprising the following steps:
acquiring a tunnel image;
extracting image features from the tunnel image;
carrying out homography cone mapping by combining the extracted image features to construct a cost body;
regularizing the obtained cost body to obtain a depth estimation image;
and carrying out dense reconstruction on the depth estimation map to obtain a three-dimensional model of the tunnel.
2. The MVSNET-based tunnel three-dimensional model reconstruction method according to claim 1, wherein the step of acquiring a tunnel image comprises:
and acquiring the tunnel image through an unmanned aerial vehicle or a patrol robot.
3. The MVSNET-based tunnel three-dimensional model reconstruction method according to claim 1, wherein the step of extracting image features from the tunnel image comprises:
and extracting the depth map information of the tunnel image by using a convolutional neural network added with an attention mechanism to realize the extraction of the image characteristics.
4. The MVSNET-based tunnel three-dimensional model reconstruction method according to claim 1, wherein the step of constructing a cost body by performing homography cone mapping in combination with the extracted image features comprises:
and combining the extracted image features and a camera view cone to carry out homography cone mapping to construct the cost body.
5. The MVSNET-based tunnel three-dimensional model reconstruction method according to claim 4, wherein the camera view cone comprises a near plane and a far plane formed by a camera, and the camera view cone is formed by connecting the camera, the near plane and the far plane.
6. The MVSNET-based tunnel three-dimensional model reconstruction method according to claim 1, wherein the tunnel image comprises a resource picture and a reference picture, and the step of constructing the cost body by performing homography cone mapping by combining the extracted image features comprises the following steps of:
obtaining a mapping relation with the reference picture from the resource picture by utilizing homography transformation;
and determining an expression of the cost body by using the cost index based on the variance.
7. The MVSNET-based tunnel three-dimensional model reconstruction method according to claim 1, wherein the cost body has an expression as follows:
8. The MVSNET-based tunnel three-dimensional model reconstruction method according to claim 1, wherein the step of regularizing the obtained cost body to obtain a depth estimation map comprises:
and regularizing the obtained cost body through multi-scale 3DCNN to obtain a depth estimation image.
9. The MVSNET-based tunnel three-dimensional model reconstruction method according to claim 8, wherein the step of regularizing the obtained cost body by a multi-scale 3DCNN to obtain a depth estimation map comprises:
optimizing the obtained cost body to obtain a probability body;
normalizing the probability value of the probability body in the depth direction by using a softmax operation;
and applying the probability body to depth value prediction to obtain the depth estimation image.
10. The MVSNET-based tunnel three-dimensional model reconstruction method of claim 9, wherein the step of regularizing the obtained cost body by multi-scale 3DCNN to obtain a depth estimation map further comprises:
deriving image boundary information from the tunnel image as a guide to optimize the depth estimation map.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210323841.0A CN114648608A (en) | 2022-03-29 | 2022-03-29 | Tunnel three-dimensional model reconstruction method based on MVSNET |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210323841.0A CN114648608A (en) | 2022-03-29 | 2022-03-29 | Tunnel three-dimensional model reconstruction method based on MVSNET |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114648608A true CN114648608A (en) | 2022-06-21 |
Family
ID=81995579
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210323841.0A Pending CN114648608A (en) | 2022-03-29 | 2022-03-29 | Tunnel three-dimensional model reconstruction method based on MVSNET |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114648608A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115082540A (en) * | 2022-07-25 | 2022-09-20 | 武汉图科智能科技有限公司 | Multi-view depth estimation method and device suitable for unmanned aerial vehicle platform |
-
2022
- 2022-03-29 CN CN202210323841.0A patent/CN114648608A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115082540A (en) * | 2022-07-25 | 2022-09-20 | 武汉图科智能科技有限公司 | Multi-view depth estimation method and device suitable for unmanned aerial vehicle platform |
CN115082540B (en) * | 2022-07-25 | 2022-11-15 | 武汉图科智能科技有限公司 | Multi-view depth estimation method and device suitable for unmanned aerial vehicle platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Heber et al. | Neural epi-volume networks for shape from light field | |
CN108596108B (en) | Aerial remote sensing image change detection method based on triple semantic relation learning | |
CN111861880B (en) | Image super-fusion method based on regional information enhancement and block self-attention | |
CN109086777B (en) | Saliency map refining method based on global pixel characteristics | |
CN110163207B (en) | Ship target positioning method based on Mask-RCNN and storage device | |
CN113554032B (en) | Remote sensing image segmentation method based on multi-path parallel network of high perception | |
CN113449735B (en) | Semantic segmentation method and device for super-pixel segmentation | |
CN115512103A (en) | Multi-scale fusion remote sensing image semantic segmentation method and system | |
CN115661459A (en) | 2D mean teacher model using difference information | |
CN114372523A (en) | Binocular matching uncertainty estimation method based on evidence deep learning | |
CN114723884A (en) | Three-dimensional face reconstruction method and device, computer equipment and storage medium | |
CN114648608A (en) | Tunnel three-dimensional model reconstruction method based on MVSNET | |
CN112329662B (en) | Multi-view saliency estimation method based on unsupervised learning | |
CN108334851B (en) | Rapid polarization SAR image segmentation method based on anisotropic property | |
WO2021159778A1 (en) | Image processing method and apparatus, smart microscope, readable storage medium and device | |
Wei et al. | 3D face image inpainting with generative adversarial nets | |
CN114332211B (en) | Part pose calculation method based on edge reconstruction and dense fusion network | |
CN109697695A (en) | The ultra-low resolution thermal infrared images interpolation algorithm of visible images guidance | |
CN115760695A (en) | Image anomaly identification method based on depth vision model | |
CN115239559A (en) | Depth map super-resolution method and system for fusion view synthesis | |
CN112446292B (en) | 2D image salient object detection method and system | |
CN115063352A (en) | Salient object detection device and method based on multi-graph neural network collaborative learning architecture | |
Huan et al. | Remote sensing image reconstruction using an asymmetric multi-scale super-resolution network | |
CN114155406A (en) | Pose estimation method based on region-level feature fusion | |
CN113780305A (en) | Saliency target detection method based on interaction of two clues |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |