CN116310131A - Three-dimensional reconstruction method considering multi-view fusion strategy - Google Patents

Three-dimensional reconstruction method considering multi-view fusion strategy Download PDF

Info

Publication number
CN116310131A
CN116310131A CN202310315104.0A CN202310315104A CN116310131A CN 116310131 A CN116310131 A CN 116310131A CN 202310315104 A CN202310315104 A CN 202310315104A CN 116310131 A CN116310131 A CN 116310131A
Authority
CN
China
Prior art keywords
view
stage
depth
map
cost
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310315104.0A
Other languages
Chinese (zh)
Inventor
路锦正
黄炳森
李强
彭波
赵集
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Original Assignee
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology filed Critical Southwest University of Science and Technology
Priority to CN202310315104.0A priority Critical patent/CN116310131A/en
Publication of CN116310131A publication Critical patent/CN116310131A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a three-dimensional reconstruction method considering a multi-View fusion strategy, which comprises the steps of respectively carrying out homography transformation and distortion on N-1 source feature images and reference feature images in a first stage to construct cost bodies, transmitting each cost body into View-Net to obtain a weight image of the View relative to the reference image, carrying out weighted fusion on the weight image and the cost bodies, and finally obtaining the cost bodies with useful information elements; performing cost body regularization through conventional 3D convolution, outputting a low-resolution depth map, initializing the depth information of the depth map for the preset depth of the next stage, and obtaining a final predicted depth map after three stages to finish three-dimensional reconstruction; the invention provides a new expression mode for multi-view cost volume fusion, improves the availability of matched pixels, reduces the interference of unmatched pixels, further obtains a more accurate depth estimation graph, and improves the integrity of three-dimensional point cloud reconstruction.

Description

Three-dimensional reconstruction method considering multi-view fusion strategy
Technical Field
The invention belongs to the technical field of three-dimensional reconstruction, and particularly relates to a three-dimensional reconstruction method considering a multi-view fusion strategy.
Background
With the continuous improvement of image demands and continuous rising of requirements on scene cognition, three-dimensional space vision becomes more and more important, and various scene structures and detail differences missing in a two-dimensional plane can be obtained through observation of the three-dimensional space, so that user experience is more real and reliable. The conventional three-dimensional reconstruction device can reconstruct the three-dimensional space of the picture shot by the camera, but the cost is very high, which is insufficient for supporting daily use, and the device cannot be popularized to the public.
Multi-view stereo (MVS) aims to recover 3D scene geometry from a set of RGB images using known camera poses and to obtain a 3D dense model of the real world scene from multiple images. It has many important applications such as document reconstruction, virtual reality, autopilot, defect detection, etc. The deep learning based MVS approach tends to use frontal plane scanning to evaluate the depth of the same candidate set based on the same image for each pixel, and achieves higher accuracy and integrity on many MVS benchmarks than the prior art, compared to conventional MVS approaches that utilize a manually made matching metric for image consistency checking. While learning-based MVS achieves unexpected results, there are many places to solve and optimize to further improve the quality of point cloud reconstruction. Convolutional Neural Networks (CNNs) have been widely used for three-dimensional reconstruction and a broader computer vision task, with recent multi-view stereo matching algorithms typically calculating a 3D cost volume based on a set of hypothetical depths, applying 3D convolution to the cost volume to regularize and regress the final scene depth.
Several studies have demonstrated the importance of the construction of cost volumes to depth prediction accuracy. From the prior MVS pipeline flow, N-1 source graphs and 1 reference graph are known to be input, two-view cost volumes are constructed through homography relation between each source graph and the reference graph, then N-1 two-view cost volumes are compressed into a final cost volume in a specific fusion mode, and finally 3DCNN or RNN regularization is utilized for depth prediction. One of the very important issues is the judgment of the visibility of the pixels in the view and the selection of the view. The disturbance of the extraneous view and the introduction of erroneous pixels can render verification of the final score ineffective or even negatively affected, resulting in an erroneous depth estimation.
The original MVSNet uses a view fusion strategy in the traditional sense to calculate variance distribution of different two-view cost volumes, the variances are used as weights to combine the different view cost volumes, and a plurality of variant algorithms follow the scheme. Other methods apply averaging or max pooling etc. to aggregate matching costs or use some specific implicit network layers directly to get so-called adaptive weight maps which are not reasonably theoretical to explain. Although the network may implicitly learn how to discard the unmatched pixels in the view, the unmatched pixels of the view interfere with still inevitably deteriorating the final reconstruction.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a three-dimensional reconstruction method considering a multi-view fusion strategy, so as to solve the problems that the verification of a final score becomes invalid or even brings negative influence due to the interference of irrelevant views and the transmission of wrong pixels in the existing three-dimensional reconstruction, thereby causing the error of depth estimation.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a three-dimensional reconstruction method taking into account a multi-view fusion strategy, comprising the steps of:
s1, acquiring a plurality of pictures acquired by a camera;
s2, extracting features of the pictures to obtain three different feature images;
s3, mapping the N-1 source feature images in one stage to an assumed depth plane of the reference view by adopting homography transformation;
s4, respectively constructing N1 initial cost bodies based on the N-1 source characteristic images and a reference image;
s5, performing self-adaptive weight training on the N-1 initial cost bodies through View-Net to obtain a weight graph corresponding to the initial cost bodies;
s6, carrying out weighted fusion on the weight graph and the initial cost body to obtain a cost body to be regularized;
s7, regularizing the cost body in the step S6 to obtain a probability map, and generating a depth map of the stage based on the probability map;
s8, taking the depth map of the stage in the step S7 as the initialization of the depth presetting of the second stage, and circularly executing the steps S3 to S7 to generate and obtain the depth map of the second stage;
s9, taking the depth map obtained in the second stage as an initialization of depth presetting in the third stage, and circularly executing the steps S3 to S7 to generate a final predicted depth map obtained in the third stage;
and S10, based on the final predicted depth map, completing three-dimensional reconstruction of the cost body.
Further, in step S3, mapping the N-1 source feature images in one stage onto the assumed depth plane of the reference view by using homography to obtain the implicit depth relationship between images of different perspectives:
Figure BDA0004149960530000031
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004149960530000032
is->
Figure BDA0004149960530000033
Homography relation between the feature map of the individual view and the reference feature map with depth d; k (K) i ,R i ,t i Intrinsic camera properties, rotation and translation, respectively, of the ith view, < >>
Figure BDA0004149960530000034
Is the principal axis of the reference camera; i, K 0 ,R 0 Extracting the residuals of the factors for the determinant respectivelyThe source view camera references, and the source view rotates and translates the external references.
Further, the step S4 specifically includes:
according to 1 reference feature map, N-1 source feature map and corresponding camera and pose parameters, by matching the first image with the second image
Figure BDA0004149960530000041
Zhang Yuan characteristic diagram is subjected to homography distortion transformation to a preset plane of a reference camera, and N-1 initial cost bodies are obtained through construction.
Further, in step S5, N-1 initial cost volumes are subjected to self-adaptive weight training through View-Net, so that a weight graph corresponding to the initial cost volumes is obtained:
Figure BDA0004149960530000042
wherein V (x) is a view weight map, exp (-x) is the power of e-x.
Further, in step S6, performing weighted fusion on the weight graph and the initial cost body to obtain a cost body to be regularized, including:
Figure BDA0004149960530000043
wherein Vt is otal (k) K is {1,2,3}; warp n (k) Is an initial cost body after homography distortion of an nth source view and a reference view in a kth stage; view n (k) And outputting one-dimensional weight information for the nth initial cost body in the kth stage through View-Net.
Further, the average absolute difference between the true depth map and the estimated depth map is calculated by adopting the smooth L1 Loss, and the three phase losses are accumulated as a final Loss:
Figure BDA0004149960530000044
wherein k is a cascade stage,
Figure BDA0004149960530000045
d (p, k) is the true depth value of the pixel p of the kth stage, which is the set of valid true pixels,/i>
Figure BDA0004149960530000046
Is the predicted depth value of the pixel p of the kth stage.
The three-dimensional reconstruction method considering the multi-view fusion strategy provided by the invention has the following beneficial effects:
the invention provides a general aggregation network, which uses a general sub-network serving as an MVS pipeline network for training pixel-level confidence of a two-view cost body, and provides a new expression mode for multi-view cost body fusion, so as to improve the usability of matched pixels and reduce the interference of unmatched pixels, further obtain a more accurate depth map and improve the integrity of three-dimensional point cloud reconstruction.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a network architecture of the present invention.
FIG. 3 is a hypothetical depth plane for mapping N-1 source signatures to a reference view by homography transformation in accordance with the present invention.
FIG. 4 is a schematic diagram of a two-view fusion strategy according to the present invention.
FIG. 5 is a two-view weighting module according to the present invention.
Fig. 6 is a View of a DTU test set Scan1, in which a plurality of two-View cost volumes obtain corresponding fusion weights after passing through View-Net.
FIG. 7 is a plot of increased number of views versus different levels of error accuracy.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.
Example 1
The embodiment provides a three-dimensional reconstruction method considering a multi-view fusion strategy, which is used for solving the problem that a worsening factor is added in the cost body fusion process due to uncertainty of view selection and pixel homography distortion in the existing method, and specifically comprises the following steps with reference to fig. 1:
s1, acquiring a plurality of pictures acquired by a camera;
referring to fig. 2, the input of the present embodiment includes a plurality of captured pictures and corresponding camera parameters.
S2, extracting features of the pictures to obtain three different feature images;
in the embodiment, an FPN architecture is adopted as a feature extraction layer to respectively obtain three stage feature graphs (NxC x H x W) with different sizes;
the present embodiment assumes that the depth is uniformly sampled (e.g., 1-192 mm) from a range of depths throughout all stages. The first stage acquires image features at low resolution and constructs a cost volume through homography mapping at a predetermined depth range and larger depth interval, and the subsequent stage uses high spatial resolution, narrower depth range and smaller depth interval to obtain a finer depth prediction map.
In this embodiment, a first stage is taken as an example to describe a sub-network, wherein the 0 th stage is a reference feature map, N is a 1 source feature map, the N-1 source feature maps and the reference feature map are respectively subjected to homography transformation and distortion to construct cost bodies, and each cost body is transmitted into View-Net to obtain a weight map of the View relative to the reference map; the weight map can restrain non-matching information in the view and strengthen transformable information, and the weight map is similar to the function of an attention mechanism, and particularly, the weight map can restrain non-matching information in the view and strengthen transformable information, and the weight map can be used for solving the problems that in the prior art, the weight map is similar to the attention mechanism, and the weight map can be used for restraining non-matching information in the view, and can be used for strengthening transformable information, and particularly comprises steps S3-S6:
s3, mapping the N-1 source feature images in one stage onto an assumed depth plane of the reference view by adopting homography transformation;
referring to fig. 3, according to the epipolar principle, N-1 source feature maps are mapped onto d hypothetical planes of a reference view through homography transformation warping, thereby obtaining implicit depth relationships between images of different views.
The coordinate mapping is determined by homography:
Figure BDA0004149960530000061
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004149960530000062
is->
Figure BDA0004149960530000063
Homography relation between the feature map of the individual view and the reference feature map with depth d; in addition, K i ,R i ,t i Intrinsic camera properties, rotation and translation, respectively, of the ith view, < >>
Figure BDA0004149960530000071
Is the principal axis of the reference camera; the 2D feature map is then warped into the hypothetical plane of the reference camera using a differentiable homography transformation to form a plurality of two-view cost volumes.
S4, respectively constructing N-1 initial cost bodies based on the N-1 source characteristic images and a reference view;
referring to FIG. 3, 1 reference feature map and N-1 source feature map are input, and the corresponding camera and pose parameters are respectively input by the method of the first pair of
Figure BDA0004149960530000072
Zhang Yuan characteristic diagram is subjected to homography distortion transformation to a preset plane of a reference camera, and N-1 initial cost bodies are obtained through construction.
Step S5, the weight module of the two views of the embodiment carries out self-adaptive weight training on N-1 initial cost bodies through View-Net to obtain a weight graph (1 XH XW) corresponding to the initial cost bodies;
referring to FIG. 5, the input of the view-Net module is a two-view cost volume, the number of channels is reduced to 1/2 of the original number through a Conv3d layer, and a BatchNorm3d-Relu is used as the active layer of the layer for fast convergence in the initial stage. The number of channels is then reduced to one dimension (1×d×h×w) by one Conv3D layer. The above operation can be considered here as a temporary cost volume x, and is converted into a final usefulness map (view weight map) V (x) (1×h×w) by the following formula.
Figure BDA0004149960530000073
Of course, to meet the coarse-to-fine training strategy, three independent View-nets are set for three different cascade stages, and specific setting parameters thereof are shown in table 1:
TABLE 1 three stage View-Net set parameters where the original resolution of the input is 512×640, the input obtained for each stage is in turn 1/4,1/2,1 times the original resolution
Figure BDA0004149960530000081
Step S6, referring to FIG. 4, in the two-view fusion strategy of the embodiment, weighting and fusing the weight graph and the initial cost body to obtain a cost body to be regularized;
the method adopts the method that the unsiqueeze dimension-increasing is multiplied by the cost body and added, and finally the added quantity is divided to obtain the fused final cost body, and the specific implementation formula is as follows:
Figure BDA0004149960530000082
wherein, warp n (k) The initial cost body after homography distortion of the nth source View and the reference View in the kth stage is View n (k) One-dimensional weight information V outputted by View-Net for the nth initial cost body in the kth stage total (k) And (3) for the fused cost body k E {1,2,3} in the kth stage.
By adopting the fusion strategy of the step, the effective matching pixel sampling under different source views can be improved, the interference of non-matching pixels is restrained, and finally, a fusion weight diagram of the source view and the reference view is obtained; and fusing all the two-view cost volumes by pixel-level weights, so as to reduce the interference information of the final cost volumes.
Step S7, regularizing the cost body in the step S6 to obtain a probability map, and generating a depth map of the stage based on the probability map;
the regularization processing in this step obtains a probability map and generates a predicted depth map in this stage based on the probability map directly by conventional means, so that specific processes are not repeated in this embodiment.
Step S8, taking the depth map of the stage in the step S7 as the initialization of the depth presetting of the second stage, and circularly executing the steps S3 to S7 to generate a predicted depth map of the second stage;
step S9, taking the depth map obtained in the second stage as the initialization of the depth presetting in the third stage, and circularly executing the steps S3 to S7 to generate a final predicted depth map obtained in the third stage;
step S10, based on the final predicted depth map, completing three-dimensional reconstruction of the cost body;
the step is based on the final predicted depth map to complete three-dimensional reconstruction of the cost body, and conventional means in the field are adopted, so that detailed processes thereof are not repeated.
In this embodiment, for three stages, namely, loss calculation is performed on three-layer cascade output results, each layer calculates an average absolute difference between a real depth map and an estimated (predicted) depth map by using a smooth L1 Loss, and three-stage losses are accumulated as a final Loss:
Figure BDA0004149960530000091
wherein k is a cascade stage,
Figure BDA0004149960530000092
d (p, k) is the true depth value of the pixel p of the kth stage, which is the set of valid true pixels,/i>
Figure BDA0004149960530000093
Is the predicted depth value of the pixel p of the kth stage.
Example 2
This example is used for performing evaluation verification of the method steps in example 1;
specifically, the embodiment trains the DTU training set on the DTU training set; in the training phase, the number of input images n=3 is set, and the image resolution is 512×640, in accordance with the conventional MVS operation. For coarse to fine regularization, the depth is assumed to be from 425 mm to 935 mm samples; the number of planar scan depth hypotheses for each stage is 48, 32 and 8, respectively; the corresponding depth interval decays 0.25 and 0.5 from the coarsest phase to the finer phase. The model was trained with Adam for 16 periods with an initial learning rate of 0.001, decayed 0.5 fold after 6, 8 and 12 periods, respectively, using a batch size of 2 on 1 ambida RTX 3090 GPU, with one batch occupying 6GB of memory.
Evaluating the proposed method on an evaluation set of the DTU dataset with an official evaluation criterion; in the evaluation phase, n=5 was set, the input resolution was 864×1152, and the quantitative evaluation thereof was as shown in table 2;
table 2DTU dataset quantitative evaluation (lower better), the method herein outperforms our baseline network and most other advanced methods in terms of integrity
Figure BDA0004149960530000101
From the partial multi-view stereo qualitative results on the DTU dataset of the table above, the method of the present invention is significantly denser and more complete in point cloud reconstruction.
To verify the advantages of the fusion strategy proposed by the present invention, the contribution of the strategy to MVS is visually represented, and the visibility graph after each two views pass through View-Net is visualized as shown in FIG. 6. As is evident from the figure, the contribution of the different viewing angles to the reference viewing angle is different, a lighter area symbolizes that the partial area matches more pixels with the reference view, while a lighter area implies that the partial area does not have a great relationship with the reference view. In other words, the amount of information that is focused on viewing the same object from different perspectives is not the same. So as to distribute the importance degree of homography distortion of different pixels in the reference visual angle;
from Vis-MVSNet, a good multi-view fusion strategy should be unaffected by the number of views, and not result in a drop in results due to the addition of source views. Performance tests using different numbers of source views on large and small scene data sets, respectively, are therefore performed for the proposed fusion strategy.
Verification tests were performed on the DTU dataset and the blendervs dataset, respectively, and the experimental results are shown in table 3 and fig. 7. As can be seen from experiments, with the improvement of the view quantity, the depth estimation accuracy of the method is gradually improved, and particularly, the improvement of high-accuracy ranges of 2mm, 4mm and the like is obvious.
Table 3 tests from left to right that the depth map introducing 2-8 views is smaller than the error of different precision, respectively.
Figure BDA0004149960530000111
Based on the verification, the filtering of the image information by the method provided by the invention is highlighted, the calculation of invalid image pixels is avoided, the estimation of correct pixels is improved, and therefore, the integrity of the reconstructed point cloud is improved.
Although specific embodiments of the invention have been described in detail with reference to the accompanying drawings, it should not be construed as limiting the scope of protection of the present patent. Various modifications and variations which may be made by those skilled in the art without the creative effort are within the scope of the patent described in the claims.

Claims (6)

1. The three-dimensional reconstruction method taking the multi-view fusion strategy into consideration is characterized by comprising the following steps of:
s1, acquiring a plurality of pictures acquired by a camera;
s2, extracting features of the pictures to obtain three different feature images;
s3, mapping the N-1 source feature images in one stage to an assumed depth plane of the reference view by adopting homography transformation;
s4, respectively constructing N-1 initial cost bodies based on the N-1 source characteristic images and a reference view;
s5, performing self-adaptive weight training on the N-1 initial cost bodies through View-Net to obtain a weight graph corresponding to the initial cost bodies;
s6, carrying out weighted fusion on the weight graph and the initial cost body to obtain a cost body to be regularized;
s7, regularizing the cost body in the step S6 to obtain a probability map, and generating a depth map of the stage based on the probability map;
s8, taking the depth map of the stage in the step S7 as the initialization of the depth presetting of the second stage, and circularly executing the steps S3 to S7 to generate and obtain the depth map of the second stage;
s9, taking the depth map obtained in the second stage as an initialization of depth presetting in the third stage, and circularly executing the steps S3 to S7 to generate a final predicted depth map obtained in the third stage;
and S10, based on the final predicted depth map, completing three-dimensional reconstruction of the cost body.
2. The three-dimensional reconstruction method considering a multi-view fusion strategy according to claim 1, wherein in the step S3, a homography transformation is adopted to map N-1 source feature images in a stage onto a hypothetical depth plane of a reference view, so as to obtain an implicit depth relationship between images of different view angles:
Figure FDA0004149960520000011
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure FDA0004149960520000021
is->
Figure FDA0004149960520000022
Homography relation between the feature map of the individual view and the reference feature map with depth d; k (K) i ,R i ,t i Intrinsic camera properties, rotation and translation, respectively, of the ith view, < >>
Figure FDA0004149960520000023
Is the principal axis of the reference camera; i, K 0 ,R 0 Extracting the residual factors of the determinant respectively, and carrying out rotation translation external parameters of the source view by using the internal parameters of the source view camera.
3. The three-dimensional reconstruction method according to claim 2, wherein the step S4 specifically comprises:
according to 1 reference feature map, N-1 source feature map and corresponding camera and pose parameters, by matching the first image with the second image
Figure FDA0004149960520000024
Zhang Yuan characteristic diagram is subjected to homography distortion transformation to a preset plane of a reference camera, and N-1 initial cost bodies are obtained through construction.
4. The three-dimensional reconstruction method considering the multi-View fusion strategy according to claim 1, wherein in the step S5, the adaptive weight training is performed on N-1 initial cost volumes through View-Net, so as to obtain a weight graph corresponding to the initial cost volumes:
Figure FDA0004149960520000025
where V (x) is the view weight map, exp (-) is the power of e.
5. The three-dimensional reconstruction method considering the multi-view fusion strategy according to claim 4, wherein the step S6 of performing weighted fusion on the weight map and the initial cost body to obtain the cost body to be regularized comprises:
Figure FDA0004149960520000026
wherein V is total (k) K is {1,2,3}; warp n (k) Is an initial cost body after homography distortion of an nth source view and a reference view in a kth stage; view n (k) And outputting one-dimensional weight information for the nth initial cost body in the kth stage through View-Net.
6. The three-dimensional reconstruction method considering the multi-view fusion strategy according to claim 1, wherein the average absolute difference between the true depth map and the estimated depth map is calculated by adopting a smooth L1 Loss, and three phase losses are accumulated as a final Loss:
Figure FDA0004149960520000031
wherein k is a cascade stage,
Figure FDA0004149960520000032
d (p, k) is the true depth value of the pixel p of the kth stage, which is the set of valid true pixels,/i>
Figure FDA0004149960520000033
Is the predicted depth value of the pixel p of the kth stage.
CN202310315104.0A 2023-03-28 2023-03-28 Three-dimensional reconstruction method considering multi-view fusion strategy Pending CN116310131A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310315104.0A CN116310131A (en) 2023-03-28 2023-03-28 Three-dimensional reconstruction method considering multi-view fusion strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310315104.0A CN116310131A (en) 2023-03-28 2023-03-28 Three-dimensional reconstruction method considering multi-view fusion strategy

Publications (1)

Publication Number Publication Date
CN116310131A true CN116310131A (en) 2023-06-23

Family

ID=86816681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310315104.0A Pending CN116310131A (en) 2023-03-28 2023-03-28 Three-dimensional reconstruction method considering multi-view fusion strategy

Country Status (1)

Country Link
CN (1) CN116310131A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117437363A (en) * 2023-12-20 2024-01-23 安徽大学 Large-scale multi-view stereoscopic method based on depth perception iterator
CN117671163A (en) * 2024-02-02 2024-03-08 苏州立创致恒电子科技有限公司 Multi-view three-dimensional reconstruction method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117437363A (en) * 2023-12-20 2024-01-23 安徽大学 Large-scale multi-view stereoscopic method based on depth perception iterator
CN117437363B (en) * 2023-12-20 2024-03-22 安徽大学 Large-scale multi-view stereoscopic method based on depth perception iterator
CN117671163A (en) * 2024-02-02 2024-03-08 苏州立创致恒电子科技有限公司 Multi-view three-dimensional reconstruction method and system
CN117671163B (en) * 2024-02-02 2024-04-26 苏州立创致恒电子科技有限公司 Multi-view three-dimensional reconstruction method and system

Similar Documents

Publication Publication Date Title
Lin et al. Dynamic spatial propagation network for depth completion
Wang et al. 360sd-net: 360 stereo depth estimation with learnable cost volume
WO2018127007A1 (en) Depth image acquisition method and system
CN103854283B (en) A kind of mobile augmented reality Tracing Registration method based on on-line study
CN116310131A (en) Three-dimensional reconstruction method considering multi-view fusion strategy
CN110070598B (en) Mobile terminal for 3D scanning reconstruction and 3D scanning reconstruction method thereof
CN106023303B (en) A method of Three-dimensional Gravity is improved based on profile validity and is laid foundations the dense degree of cloud
CN106023230B (en) A kind of dense matching method of suitable deformation pattern
CN115205489A (en) Three-dimensional reconstruction method, system and device in large scene
CN110956661B (en) Method for calculating dynamic pose of visible light and infrared camera based on bidirectional homography matrix
CN109544628B (en) Accurate reading identification system and method for pointer instrument
CN114067197B (en) Pipeline defect identification and positioning method based on target detection and binocular vision
CN109859137B (en) Wide-angle camera irregular distortion global correction method
CN111784778A (en) Binocular camera external parameter calibration method and system based on linear solving and nonlinear optimization
CN110910456B (en) Three-dimensional camera dynamic calibration method based on Harris angular point mutual information matching
CN113744337A (en) Synchronous positioning and mapping method integrating vision, IMU and sonar
CN110033461B (en) Mobile phone anti-shake function evaluation method based on target displacement estimation
CN113393439A (en) Forging defect detection method based on deep learning
CN114119739A (en) Binocular vision-based hand key point space coordinate acquisition method
CN116129037B (en) Visual touch sensor, three-dimensional reconstruction method, system, equipment and storage medium thereof
CN112150518B (en) Attention mechanism-based image stereo matching method and binocular device
CN115601406A (en) Local stereo matching method based on fusion cost calculation and weighted guide filtering
CN113963117A (en) Multi-view three-dimensional reconstruction method and device based on variable convolution depth network
CN115359127A (en) Polarization camera array calibration method suitable for multilayer medium environment
CN111062900B (en) Binocular disparity map enhancement method based on confidence fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination