CN112288788A - Monocular image depth estimation method - Google Patents

Monocular image depth estimation method Download PDF

Info

Publication number
CN112288788A
CN112288788A CN202011084248.2A CN202011084248A CN112288788A CN 112288788 A CN112288788 A CN 112288788A CN 202011084248 A CN202011084248 A CN 202011084248A CN 112288788 A CN112288788 A CN 112288788A
Authority
CN
China
Prior art keywords
image
depth
training
monocular
training image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011084248.2A
Other languages
Chinese (zh)
Other versions
CN112288788B (en
Inventor
霍智勇
乔璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202011084248.2A priority Critical patent/CN112288788B/en
Publication of CN112288788A publication Critical patent/CN112288788A/en
Application granted granted Critical
Publication of CN112288788B publication Critical patent/CN112288788B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

A method of monocular image depth estimation, the method comprising: acquiring a training image; inputting the obtained training image into a depth prediction network which is constructed in advance for training to obtain a corresponding prediction depth map; and performing joint loss calculation on the obtained prediction depth map and the corresponding GT depth map by adopting a joint loss function of sequencing loss, multi-scale structure similarity loss and multi-scale invariant gradient matching loss to obtain a corresponding monocular depth estimation map. By the aid of the scheme, accuracy of monocular image depth estimation can be improved.

Description

Monocular image depth estimation method
Technical Field
The invention relates to the technical field of image processing, in particular to a monocular image depth estimation method.
Background
The acquisition of three-dimensional depth information from two-dimensional images is an important problem in the field of computer vision and is also an important component for understanding the geometric relationship of scenes. The image depth information has important application in the fields of simultaneous localization and mapping (SLAM), navigation, target detection, semantic segmentation and the like.
Monocular image depth estimation, unlike conventional methods based on multi-view and binocular stereo matching, uses only images of a single view to perform depth estimation. Because most application scenes in real life only provide data of a single viewpoint, monocular image depth estimation is closer to the actual application requirement.
However, the conventional monocular image depth estimation method has a problem of low accuracy.
Disclosure of Invention
The invention aims to provide a monocular image depth estimation method to improve the accuracy of monocular image depth estimation.
In order to solve the above technical problem, an embodiment of the present invention provides a monocular image depth estimation method, where the method includes:
acquiring a training image;
inputting the obtained training image into a depth prediction network which is constructed in advance for training to obtain a corresponding prediction depth map;
and performing joint loss calculation on the obtained prediction depth map and the corresponding GT depth map by adopting a joint loss function of sequencing loss, multi-scale structure similarity loss and multi-scale invariant gradient matching loss to obtain a corresponding monocular depth estimation map.
Optionally, before inputting the acquired training image into a pre-constructed depth prediction network for training, the method further includes:
expanding the training image to obtain a first training image;
adjusting the expanded training image to a resolution ratio to obtain a second training image;
and carrying out normalization processing on the second training image to obtain a preprocessed training image.
Optionally, the expanding the training image includes: and performing at least one of scaling, rotation and random horizontal flipping on the training image.
Optionally, the inputting the acquired training image into a pre-constructed depth prediction network for training includes:
inputting the preprocessed training images into a preset ResNet50 network to obtain a plurality of specific layer characteristic images with sequentially reduced resolution;
performing reverse-order traversal on the plurality of specific-layer feature images to obtain traversed current specific-layer feature images;
merging the current specific layer feature image and the corresponding feature fusion image to generate an image with the same resolution as that of the specific layer feature image of the previous bit sequence until the traversal of the specific layer feature images is completed; and the corresponding feature fusion image is obtained by fusing the feature image of the next specific layer with the bilinear up-sampling image of the feature image of the specific layer after the residual convolution is carried out on the feature image of the next specific layer.
Optionally, the joint loss function is:
L=Lrank+αLms-ssim+βLgrad
wherein L represents the joint loss function, LrankRepresenting the ordering penalty based on random sampling, Lms-ssRepresenting a multi-scale structure similarity loss function, LgradThe method is characterized by representing the multi-scale invariant gradient matching loss, alpha represents the balance factor of the multi-scale structure similarity loss function, and beta represents the balance factor of the multi-scale invariant gradient matching loss function.
Optionally, the number of the specific layer feature images is 4.
Optionally, the GT depth map obtains the horizontal component of the optical flow of the binocular image using flonet 2.0.
Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:
according to the scheme, a training image is obtained; inputting the obtained training image into a depth prediction network which is constructed in advance for training to obtain a corresponding prediction depth map; and performing joint loss calculation on the obtained prediction depth map and the corresponding GT depth map by adopting a joint loss function of sequencing loss, multi-scale structure similarity loss and multi-scale invariant gradient matching loss to obtain a corresponding monocular depth estimation map. According to the scheme, during training, the predicted depth map and the GT depth map are subjected to combined loss function calculation of sequencing loss, multi-scale structure similarity loss and multi-scale invariant gradient matching loss, the problems of geometric inconsistency and edge blurring of the predicted depth map caused by only adopting sequencing loss based on random sampling point pairs are solved, and therefore the accuracy of the predicted depth map can be improved.
Drawings
FIG. 1 is a flowchart illustrating a monocular image depth estimation method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of inputting an acquired training image into a pre-constructed depth prediction network for training in the embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
Because the information that a single-viewpoint image can provide is relatively lacking and depends heavily on scene semantics, a method adopted when training is performed on one data set is often worse when the method is applied to another data set, and therefore, it is still challenging how to improve the generalization and accuracy of monocular depth estimation.
For depth estimation datasets, depth sensors (e.g., Kinect, laser scanner) have previously been used to obtain accurate depth measurements, but have been limited to rigid objects or sparse reconstructions. These data sets are limited in diversity and cannot be generalized to outdoor images. Another source for acquiring RGB-D images is synthetic data, which is noise-free and has an accurate depth measurement and significant depth discontinuities, but due to the domain differences between the synthetic data and the real data, domain adaptation is required to adapt to the actual application. In order to explore the diversity of the visual world, people are increasingly interested in outdoor scene images, such as: chen et al propose a field dataset DIW containing pairs of relative depth points annotated by hand, a network dataset Megadepth containing hundreds of known sights proposed by Li et al, and a dataset redebb proposed by Ke Xian et al that obtains dense relative depth maps from binocular images collected over the network.
Conventional monocular depth estimation is based on a geometric method, estimates camera pose and obtains sparse point cloud by means of SLAM or Motion recovery Structure (SfM) technology, and then obtains dense depth values by means of Multi-View Stereo (MVS) technology. Such methods can produce reconstructed images with high precision, but need to follow strict conditions of use, and furthermore, such methods do not deal well with non-rigid regions in the images. In recent years, deep learning has been rapidly developed, and a monocular image depth estimation method based on the deep learning has started to be focused. Eigen D et al first propose using a convolutional neural network for monocular depth estimation, the basic idea is to use a two-scale neural network, which is a global coarse-scale network and a local fine-scale network, respectively, the former obtains a coarse depth map with low resolution, and the latter improves the output of the former to obtain the final fine depth map, but the predicted depth map has low precision and is poor in detail. And then, Eigen D is improved on the basis, and a third scale network is added on the basis of the original network to output a picture with higher resolution. However, the method based on the multi-scale network adopts the data set with real depth to carry out supervised training, the data set is difficult to manufacture and small in quantity, and the adaptive scene and generalization capability of the algorithm are limited by the data set. Chen et al propose a method for depth prediction using relative depth, i.e., by training a field data set containing pairs of relative-depth viewpoints labeled manually, the relative depth information of the input image is predicted. Ke Xian et al randomly samples the GT depth map obtained from the binocular image and the predicted depth map generated by the depth convolution network and uses the pairwise ordering penalty proposed by Chen et al.
In summary, the depth information can be described by automatically extracting image features by using the convolutional neural network based on the deep learning, so as to obtain the depth prediction. However, the method still has the problems of poor generalization and insufficient depth map precision.
According to the technical scheme, when training is carried out, the predicted depth map and the GT depth map are subjected to combined loss function calculation of sequencing loss, multi-scale structure similarity loss and multi-scale invariant gradient matching loss, the problems of geometric inconsistency and edge blurring of the predicted depth map caused by only adopting sequencing loss based on random sampling point pairs are solved, and therefore the accuracy of the predicted depth map can be improved
Fig. 1 is a flowchart illustrating a monocular image depth estimation method according to an embodiment of the present invention. Referring to fig. 1, a monocular image depth estimation method in the embodiment of the present invention may specifically include:
step S101: a training image is acquired.
Step S102: and inputting the obtained training image into a depth prediction network constructed in advance for training to obtain a corresponding prediction depth map.
Step S103: and performing joint loss calculation on the obtained prediction depth map and the corresponding GT depth map by adopting a joint loss function of sequencing loss, multi-scale structure similarity loss and multi-scale invariant gradient matching loss to obtain a corresponding monocular depth estimation map.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
Step S101 is executed to acquire a training image.
In specific implementation, in order to improve the generalization capability of depth prediction, the required training data set should be more diversified, the NYU v2 data set is only applicable to indoor scenes, the KITTI data set only includes road-related scenes, the Make3D data set is smaller and only includes 959 outdoor scene data, and based on this, the invention selects the redebb data set when acquiring the training data set, and the data set includes 3600 pairs of RGB images and a GT depth map. The optical flow is obtained by a binocular stereo image pair through a flonet 2.0 network, the horizontal component of the optical flow is used as a supervised depth map, but since a large-area non-texture region such as sky is contained in a partial image, a pre-trained RefineNet is used for segmenting the sky region, and the segmented sky region is set as the maximum value of pixels in a GT depth map. Meanwhile, a validation set containing 1410 images in the DIW dataset proposed by Chen et al was used as the validation dataset herein.
In the embodiment of the invention, after the training image is acquired, the method further comprises the step of preprocessing the training image sample data. Specifically, the method may include: firstly, expanding training image sample data by using a data expansion method, wherein the expanding comprises zooming, rotating and random horizontal turning to obtain a corresponding first training image; secondly, adjusting the expanded first training image to a preset resolution, such as 384 multiplied by 384, to obtain a second training image, so as to send the second training image to a depth prediction network with an encoder being ResNet; and finally, carrying out normalization processing on the sample image data set obtained through the preprocessing operation. The calculation formula adopted by the normalization processing is as follows:
Figure BDA0002719801810000051
wherein x isi[channel]Three-channel image pixel values, y, representing a training image obtained after preprocessingiRepresenting the pixel values, mean, of the training image after normalization[channel]Mean, std, of pixel values representing the training image[channel]Representing a standard deviation of pixel values of the training image.
In an embodiment of the present invention, the setting is mean ═ 0.485,0.456,0.406, and the setting is std ═ 0.229,0.224, 0.225.
And step S102 is executed, the obtained training image is input into a depth prediction network which is constructed in advance for training, and a corresponding prediction depth map is obtained.
In specific implementation, when the acquired training image is input into a pre-constructed depth prediction network for training:
first, the preprocessed training images are input into a preset ResNet50 network, and a plurality of specific layer feature images with successively reduced resolution are obtained.
Referring to fig. 2, for example, training image data with a preprocessed size of 384 × 384 is used as an input of the depth prediction network ResNet50, the ResNet50 network as an encoder is divided into 4 different building blocks according to the resolution of an output feature map, the feature map size (in the form of W × H × C) output by each block is 96 × 96 × 256, 48 × 48 × 512, 24 × 24 × 1024, and 12 × 12 × 2048, respectively, and the size of the final feature map is 1/32 of the input image.
Secondly, performing reverse-order traversal on the plurality of specific-layer feature images to obtain a traversed current specific-layer feature image, merging the current specific-layer feature image and a corresponding feature fusion image to generate an image with the same resolution as that of the specific-layer feature image of the previous bit order until the plurality of specific-layer feature images are traversed; and the corresponding feature fusion image is obtained by fusing the feature image of the next specific layer with the bilinear up-sampling image of the feature image of the specific layer after the residual convolution is carried out on the feature image of the next specific layer.
A simple up-sampling or deconvolution of the feature map obtained by ResNet directly will generate a coarse predicted depth map. Two methods can be used to obtain better prediction results, one is hole convolution, and the other is multi-scale feature fusion. But the former will occupy too much memory and easily generate chessboard artifact; the latter can save memory and produce high quality predictions, thus selecting multiple multi-scale feature fusion modules for use in network decoders.
Referring to fig. 2, the process of forward propagation for the feature fusion part in the decoder includes: the last set of layer-specific feature maps generated by ResNet50 are first upsampled, expanding from a size of 12 × 12 × 2048 to 24 × 24 × 2048; then, using a residual convolution block for the pre-trained specific layer feature graph, and merging the residual convolution block with the fusion feature graph generated by the previous order feature fusion module; and finally, generating a feature map with the same resolution as that of the next input block by using the combined result through a residual volume block and upsampling until the four specific-layer feature images are completely traversed.
In order to generate the final depth prediction result, the result obtained by the 3 feature fusion modules is input into an adaptive output module, and the adaptive output module comprises two 3 × 3 convolution layers and a bilinear upsampling layer, so that the predicted depth map with the final size of 384 × 384 is obtained.
In order to obtain the optimal solution of the depth prediction network model, in the embodiment of the invention, the obtained prediction depth map and the corresponding GT depth map are subjected to joint loss calculation by adopting a joint loss function of sequencing loss, multi-scale structure similarity loss and multi-scale invariant gradient matching loss to obtain a corresponding monocular depth estimation map.
Specifically, before training the network, the network weight needs to be initialized, the encoder partial network weight value is initialized by a pre-trained ResNet50 network, and the decoder partial network is initialized by a random number, and the probability distribution of the encoder partial network is subject to a normal distribution with a mean value of 0 and a variance of 0.01. After the preprocessed monocular image is input into the depth prediction network in step 2, a reasonable loss function needs to be proposed as a constraint condition for optimizing network parameters. For the depth prediction image, the global accuracy and the local accuracy are evaluated, and a depth prediction result with more geometric consistency and more accurate edge can not be obtained only by adopting the sequencing loss based on random sampling, so that the accuracy of the depth prediction image is improved by adopting a combined loss function of the sequencing loss, the multi-scale structure similarity loss and the multi-scale invariant gradient matching loss. The loss function that needs to be used for training is as follows:
L=Lrank+αLms-ssim+βLgrad (2)
wherein L represents the joint loss function, LrankRepresenting the ordering penalty based on random sampling, Lms-ssimRepresenting a multi-scale structure similarity loss function, LgradThe method is characterized by representing the multi-scale invariant gradient matching loss, alpha represents the balance factor of the multi-scale structure similarity loss function, and beta represents the balance factor of the multi-scale invariant gradient matching loss function.
The ordering loss function is calculated as follows:
Figure BDA0002719801810000071
and:
Figure BDA0002719801810000072
Figure BDA0002719801810000081
where N represents the random sampling point logarithm, [ phi ] (p)i,0,pi,1) Representing a pair ordering penalty in the predicted depth map, representing depth values of pairs of depth points on the respective predicted depth image, l representing an ordering label of corresponding pairs of points on the GT depth map, pi,0 *,pi,1 *Indicates the depth value of the point pair (i,0, i,1) on the GT depth image, and τ indicates the threshold value, set to 0.02.
The multiscale structural similarity loss L _ (ms-ssim) is calculated as follows:
Figure BDA0002719801810000082
wherein, cj(p,p*)、sj(p,p*) Respectively representing the comparison of the predicted depth with the GT depth in terms of contrast and structure at a scale of j; lM(p,p*) Representing contrast in brightness only at the highest scale M; alpha is alphaM、βj、γjFor adjusting the relative importance of the different components, setting a at the scale j for simplifying the parameter selectionj=βj=γj
Figure BDA0002719801810000083
Where M represents the pixel value of the GT depth map, Ri sRepresenting the difference of the pixel values of the predicted depth map and the pixel values of the GT depth map at different scales, s representing the number of scales. In an embodiment of the present invention, the number of s is 4.
By adopting the technical scheme of the scheme in the embodiment of the invention, the training image is obtained, the obtained training image is input into the depth prediction network which is constructed in advance for training to obtain the corresponding prediction depth map, and the obtained prediction depth map and the corresponding GT depth map are subjected to joint loss calculation by adopting a joint loss function of sequencing loss, multi-scale structure similarity loss and multi-scale invariant gradient matching loss to obtain the corresponding monocular depth estimation map. During training, the predicted depth map and the GT depth map are subjected to combined loss function calculation of sequencing loss, multi-scale structure similarity loss and multi-scale invariant gradient matching loss, the problems of geometric inconsistency and edge blurring of the predicted depth map caused by only adopting sequencing loss based on random sampling point pairs are solved, and therefore the accuracy of the predicted depth map can be improved.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by instructions associated with hardware via a program, which may be stored in a computer-readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, and the like.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (7)

1. A monocular image depth estimation method, comprising:
acquiring a training image;
inputting the obtained training image into a depth prediction network which is constructed in advance for training to obtain a corresponding prediction depth map;
and performing joint loss calculation on the obtained prediction depth map and the corresponding GT depth map by adopting a joint loss function of sequencing loss, multi-scale structure similarity loss and multi-scale invariant gradient matching loss to obtain a corresponding monocular depth estimation map.
2. The monocular image depth estimation method of claim 1, wherein before inputting the acquired training image into a pre-constructed depth prediction network for training, further comprising:
expanding the training image to obtain a first training image;
adjusting the expanded training image to a resolution ratio to obtain a second training image;
and carrying out normalization processing on the second training image to obtain a preprocessed training image.
3. The monocular image depth estimation method of claim 2, wherein the augmenting the training image comprises: and performing at least one of scaling, rotation and random horizontal flipping on the training image.
4. The monocular image depth estimation method according to claim 2 or 3, wherein the inputting the acquired training image into a pre-constructed depth prediction network for training comprises:
inputting the preprocessed training images into a preset ResNet50 network to obtain a plurality of specific layer characteristic images with sequentially reduced resolution;
performing reverse-order traversal on the plurality of specific-layer feature images to obtain traversed current specific-layer feature images;
merging the current specific layer feature image and the corresponding feature fusion image to generate an image with the same resolution as that of the specific layer feature image of the previous bit sequence until the traversal of the specific layer feature images is completed; and the corresponding feature fusion image is obtained by fusing the feature image of the next specific layer with the bilinear up-sampling image of the feature image of the specific layer after the residual convolution is carried out on the feature image of the next specific layer.
5. The monocular image depth estimation method of claim 4, wherein the joint loss function is:
L=Lrank+αLms-ssim+βLgrad
wherein L represents the joint loss function, LrankRepresenting the ordering penalty based on random sampling, Lms-ssimRepresenting a multi-scale structure similarity loss function, LgradThe method comprises the steps of representing a multi-scale invariant gradient matching loss function, alpha representing a balance factor of the multi-scale structure similarity loss function, and beta representing the balance factor of the multi-scale invariant gradient matching loss function.
6. The monocular image depth estimation method of claim 4, wherein the number of the specific layer feature images is 4.
7. The monocular image depth estimation method of claim 1, wherein the GT depth map obtains a horizontal component of an optical flow of a binocular image using flonet 2.0.
CN202011084248.2A 2020-10-12 2020-10-12 Monocular image depth estimation method Active CN112288788B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011084248.2A CN112288788B (en) 2020-10-12 2020-10-12 Monocular image depth estimation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011084248.2A CN112288788B (en) 2020-10-12 2020-10-12 Monocular image depth estimation method

Publications (2)

Publication Number Publication Date
CN112288788A true CN112288788A (en) 2021-01-29
CN112288788B CN112288788B (en) 2023-04-28

Family

ID=74497002

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011084248.2A Active CN112288788B (en) 2020-10-12 2020-10-12 Monocular image depth estimation method

Country Status (1)

Country Link
CN (1) CN112288788B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114757984A (en) * 2022-04-26 2022-07-15 北京拙河科技有限公司 Scene depth estimation method and device of light field camera
CN116152323A (en) * 2023-04-18 2023-05-23 荣耀终端有限公司 Depth estimation method, monocular depth estimation model generation method and electronic equipment
CN117036439A (en) * 2023-10-09 2023-11-10 广州市大湾区虚拟现实研究院 Single image depth estimation method and system based on multi-scale residual error network
WO2023245321A1 (en) * 2022-06-20 2023-12-28 北京小米移动软件有限公司 Image depth prediction method and apparatus, device, and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107578436A (en) * 2017-08-02 2018-01-12 南京邮电大学 A kind of monocular image depth estimation method based on full convolutional neural networks FCN
CN109377530A (en) * 2018-11-30 2019-02-22 天津大学 A kind of binocular depth estimation method based on deep neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107578436A (en) * 2017-08-02 2018-01-12 南京邮电大学 A kind of monocular image depth estimation method based on full convolutional neural networks FCN
CN109377530A (en) * 2018-11-30 2019-02-22 天津大学 A kind of binocular depth estimation method based on deep neural network

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114757984A (en) * 2022-04-26 2022-07-15 北京拙河科技有限公司 Scene depth estimation method and device of light field camera
WO2023245321A1 (en) * 2022-06-20 2023-12-28 北京小米移动软件有限公司 Image depth prediction method and apparatus, device, and storage medium
CN116152323A (en) * 2023-04-18 2023-05-23 荣耀终端有限公司 Depth estimation method, monocular depth estimation model generation method and electronic equipment
CN116152323B (en) * 2023-04-18 2023-09-08 荣耀终端有限公司 Depth estimation method, monocular depth estimation model generation method and electronic equipment
CN117036439A (en) * 2023-10-09 2023-11-10 广州市大湾区虚拟现实研究院 Single image depth estimation method and system based on multi-scale residual error network

Also Published As

Publication number Publication date
CN112288788B (en) 2023-04-28

Similar Documents

Publication Publication Date Title
CN107578436B (en) Monocular image depth estimation method based on full convolution neural network FCN
Zhang et al. Multi-scale single image dehazing using perceptual pyramid deep network
CN112288788B (en) Monocular image depth estimation method
CN110782490B (en) Video depth map estimation method and device with space-time consistency
CN112001960B (en) Monocular image depth estimation method based on multi-scale residual error pyramid attention network model
Laffont et al. Rich intrinsic image decomposition of outdoor scenes from multiple views
CN111462206B (en) Monocular structure light depth imaging method based on convolutional neural network
Zhang et al. Personal photograph enhancement using internet photo collections
CN110910437B (en) Depth prediction method for complex indoor scene
WO2018053952A1 (en) Video image depth extraction method based on scene sample library
CN111626308B (en) Real-time optical flow estimation method based on lightweight convolutional neural network
CN113762358A (en) Semi-supervised learning three-dimensional reconstruction method based on relative deep training
CN112862736B (en) Real-time three-dimensional reconstruction and optimization method based on points
CN114429555A (en) Image density matching method, system, equipment and storage medium from coarse to fine
CN114996814A (en) Furniture design system based on deep learning and three-dimensional reconstruction
CN115222889A (en) 3D reconstruction method and device based on multi-view image and related equipment
CN115423978A (en) Image laser data fusion method based on deep learning and used for building reconstruction
CN116452752A (en) Intestinal wall reconstruction method combining monocular dense SLAM and residual error network
CN116310095A (en) Multi-view three-dimensional reconstruction method based on deep learning
CN112686830B (en) Super-resolution method of single depth map based on image decomposition
CN113421210A (en) Surface point cloud reconstruction method based on binocular stereo vision
CN111260712B (en) Depth estimation method and device based on refocusing polar line graph neighborhood distribution
CN114972937A (en) Feature point detection and descriptor generation method based on deep learning
Wang et al. Decomposed guided dynamic filters for efficient rgb-guided depth completion
CN112365400A (en) Rapid super-resolution reconstruction method for light field angle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant