CN114998411A - Self-supervision monocular depth estimation method and device combined with space-time enhanced luminosity loss - Google Patents

Self-supervision monocular depth estimation method and device combined with space-time enhanced luminosity loss Download PDF

Info

Publication number
CN114998411A
CN114998411A CN202210475411.0A CN202210475411A CN114998411A CN 114998411 A CN114998411 A CN 114998411A CN 202210475411 A CN202210475411 A CN 202210475411A CN 114998411 A CN114998411 A CN 114998411A
Authority
CN
China
Prior art keywords
luminosity
reconstruction
depth
max
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210475411.0A
Other languages
Chinese (zh)
Other versions
CN114998411B (en
Inventor
李嘉茂
张天宇
朱冬晨
张广慧
石文君
刘衍青
张晓林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Institute of Microsystem and Information Technology of CAS
Original Assignee
Shanghai Institute of Microsystem and Information Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Institute of Microsystem and Information Technology of CAS filed Critical Shanghai Institute of Microsystem and Information Technology of CAS
Priority to CN202210475411.0A priority Critical patent/CN114998411B/en
Publication of CN114998411A publication Critical patent/CN114998411A/en
Application granted granted Critical
Publication of CN114998411B publication Critical patent/CN114998411B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an auto-supervision monocular depth estimation method and device combined with space-time enhanced luminosity loss, wherein the method comprises the following steps: acquiring a plurality of adjacent frame images in an image sequence; and inputting the image into a trained deep learning network to obtain depth information and pose information, wherein luminosity loss information of the deep learning network is obtained based on a spatial transformation model of depth perception pixel corresponding relation, and pixels of a moving object are prevented from participating in luminosity error calculation by utilizing an omnidirectional automatic mask. The method can improve the accuracy of luminosity loss, and further monitor the learning of the deep network better.

Description

Self-supervision monocular depth estimation method and device combined with space-time enhanced luminosity loss
Technical Field
The invention relates to the technical field of computer vision, in particular to a self-supervision monocular depth estimation method and device combined with space-time enhancement luminosity loss.
Background
Estimating depth information of a scene from an image, i.e., image depth estimation, is a fundamental and important task in computer vision today. The good image depth estimation algorithm can be applied to the fields of outdoor driving scenes, indoor small robots and the like, and has great application value. During the working process of the robot or the automatic driving automobile, scene depth information is obtained by using a depth estimation algorithm to assist the robot to carry out path planning or obstacle avoidance of the next movement.
Depth estimation using images is divided into supervised and unsupervised approaches. The supervised method mainly utilizes a neural network to establish mapping between an image and a depth map, and training is carried out under the supervision of a true value, so that the network gradually has the capability of fitting the depth. However, the self-supervision method is becoming mainstream in recent years because the truth of the supervision method is expensive. Compared with a method requiring binocular image training, the method based on sequence images has become a method widely concerned by researchers due to a wider application range.
The self-supervision monocular depth frame based on the sequence image mainly comprises a depth estimation network and a pose estimation network which respectively predict the depth of a target frame and the pose transformation of the target frame and a source frame. And combining the estimated depth and pose, transforming the source frame to a coordinate system of the target frame to obtain a reconstructed image, and supervising the simultaneous training of the two networks by utilizing the difference of luminosity of the target frame and the reconstructed image, namely luminosity loss. As the loss of luminosity decreases, the depth of the network estimate becomes increasingly accurate.
The method is characterized in that a space transformation model is required to be adopted when luminosity loss is generated, although the existing space transformation model conforms to a method of rigid body transformation theoretically, certain depth estimation errors can be brought by errors of translation vectors in poses in the calculation process, namely, the depth is larger, and the error of depth estimation is larger. In addition, in order to solve the problem of inaccurate luminosity loss caused by moving pixels which violate luminosity consistency in an image, the main idea of the existing mode is to find a binarization mask which is generated by filtering out pixels with unchanged luminosity from one frame to another frame in a training process, but the binarization mask can only distinguish an object with the same movement direction as a camera.
Disclosure of Invention
The inventors of the present invention found that the reason why the larger the depth, the larger the error of the depth estimation is as follows: the purpose of the spatial transformation is to spatially transform the corresponding pixels in the target frame and the source frame to coincide on the pixel plane, provided that a near point P is used N To solve for the corresponding pixel p t And p s The corresponding relationship of (a) is shown in FIG. 1. The principle of the self-supervised depth estimation is by minimizing p t And p s To make the estimated pose and depth more accurate. For near regions, as shown in FIG. 1, in the case of a certain number of points, only if p t And the transformed point p F When the pose is relatively coincident, the estimated pose can be more accurate, and the depth performance is better. For distant regions, as shown in FIG. 2, p can be guaranteed only by the accuracy of the predicted rotation matrix t And p s The photometric error becomes small, so if the photometric error is constructed by using the estimated rotation matrix and translation vector without distinguishing the distance, the photometric error uncertainty is greatly increased, thereby causing the result of depth estimation to be poor.
The invention aims to solve the technical problem of providing a self-supervision monocular depth estimation method and a device combining with space-time enhanced luminosity loss, which can improve the accuracy of luminosity loss and further better supervise the learning of a depth network.
The technical scheme adopted by the invention for solving the technical problems is as follows: the method for estimating the depth of the self-supervision monocular combined with the space-time enhanced luminosity loss comprises the following steps:
acquiring a plurality of adjacent frame images in an image sequence;
and inputting the image into a trained deep learning network to obtain depth information and pose information, wherein luminosity loss information of the deep learning network is obtained based on a spatial transformation model of depth perception pixel corresponding relation, and pixels of a moving object are prevented from participating in luminosity error calculation by utilizing an omnidirectional automatic mask.
The luminosity loss information is obtained based on a spatial transformation model corresponding to the depth perception pixels, and specifically comprises the following steps:
carrying out spatial transformation on the far region by using a homography matrix, and constructing a first reconstruction map; wherein the far zone treats the far zone as a plane of infinity;
performing space transformation by using the basic matrix, and constructing a second reconstructed image;
and solving a luminosity error map based on the first reconstruction map and a luminosity error map based on the second reconstruction map through the corresponding relation of two pixels, and then selecting the minimum value pixel by pixel to obtain the final luminosity loss information.
The calculation for avoiding the pixel of the moving object from participating in the luminosity error by utilizing the omnidirectional automatic mask specifically comprises the following steps:
predicting the initial depth and the initial pose of a target frame through a pre-training network, and generating an initial reconstruction map;
adding interference items to the initial pose, and obtaining a plurality of assumed reconstruction frames by utilizing space transformation; generating a plurality of luminosity error maps by using the assumed reconstruction frame and combining the luminosity of the target frame, and obtaining a plurality of binarization masks by using the luminosity error maps;
and selecting the minimum value from the plurality of binarization masks as a final mask.
The disturbance term is a translational disturbance term, and comprises the following steps: [ t ] of max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0,0, -t max ]Wherein, t max Representing the maximum value in the initialized translation vector.
The technical scheme adopted by the invention for solving the technical problem is as follows: there is provided an apparatus for self-supervised monocular depth estimation in combination with spatio-temporal enhancement of photometric loss, comprising:
the acquisition module is used for acquiring a plurality of adjacent frame images in the image sequence;
the estimation module is used for inputting the image into a trained deep learning network to obtain depth information and pose information; luminosity loss information of the deep learning network is obtained based on a space transformation model of the depth perception pixel corresponding relation module, and pixels of a moving object are prevented from participating in luminosity error calculation by utilizing the omnidirectional automatic mask module.
The depth perception pixel correspondence module comprises:
the first construction unit is used for carrying out spatial transformation on the far region by using the homography matrix and constructing a first reconstruction map; wherein the far zone treats the far zone as a plane of infinity;
a second construction unit for performing spatial transformation using the basis matrix and constructing a second reconstruction map;
and the luminosity loss information acquisition unit is used for solving a luminosity error map based on the first reconstruction map and a luminosity error map based on the second reconstruction map through the corresponding relation of two pixels, and then selecting the minimum value pixel by pixel to obtain the final luminosity loss information.
The omnidirectional automatic mask module includes:
the initial reconstruction image generating unit is used for predicting the initial depth and the initial pose of the target frame through a pre-training network and generating an initial reconstruction image;
a binarization mask generating unit, which is used for adding interference items to the initial pose and obtaining a plurality of assumed reconstruction frames by utilizing space transformation; generating a plurality of luminosity error maps by using the assumed reconstruction frame and combining the luminosity of the target frame, and obtaining a plurality of binarization masks by using the luminosity error maps;
and the mask selecting unit is used for selecting the minimum value from the plurality of binary masks as a final mask.
The disturbance term is a translational disturbance term, and comprises the following steps: [ t ] of max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0,0, -t max ]Wherein, t max Representing the maximum value in the initialized translation vector.
Advantageous effects
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the invention adopts a depth perception pixel corresponding mode to excavate the pixel corresponding relation of the far area, improves the problem of inaccurate pixel corresponding of the far area, and obtains an omnidirectional binary mask by utilizing an omnidirectional automatic mask mode to avoid the pixel of a moving object from participating in the calculation of luminosity error. The invention improves the accuracy of luminosity loss by improving space transformation and generating the dynamic object automatic mask, thereby better supervising the learning of the deep network.
Drawings
FIG. 1 is a schematic diagram of a near point pose solution;
FIG. 2 is a schematic diagram of a remote point pose solution;
FIG. 3 is a schematic representation of the Monodepth2 basic framework;
FIG. 4 is a schematic diagram of the generation of light loss in the first embodiment of the present invention;
fig. 5 is a schematic view of an omnidirectional automatic mask according to a first embodiment of the present invention.
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
The first embodiment of the invention relates to an automatic supervision monocular depth estimation method combined with space-time enhanced luminosity loss, which comprises the following steps: acquiring a plurality of adjacent frame images in an image sequence; and inputting the image into a trained deep learning network to obtain depth information and pose information, wherein luminosity loss information of the deep learning network is obtained based on a spatial transformation model corresponding to the depth perception pixels, and pixels of a moving object are prevented from participating in luminosity error calculation by utilizing an omnidirectional automatic mask.
The method of the embodiment can be directly used in general self-supervision monocular depth estimation, and any work which takes the framework of SfMLearner as the implementation principle can use the method of the embodiment. The method only needs to adopt the spatial transformation model based on depth perception pixel correspondence in the embodiment to the spatial transformation part in the original frame, and adopts the omnidirectional automatic mask in the application to the automatic mask part.
The invention is further illustrated below by way of example in the basic framework of Monodepth2 of Godardetal.
For easier understanding, the overall framework of Monodepth2 is described first, as shown in FIG. 3, which is input as three adjacent frames of RGB images in a sequence; and outputting the target frame depth and the pose transformation between the target frame and the source frame.
The basic framework of this embodiment is the same as that of fig. 3. Since the improvement of the embodiment is mainly the method of generating the luminosity loss and the automatic mask part by spatial transformation, the two parts of the Monodepth2 are described first:
monodepth2 uses the same space transformation model as SfMLearner to obtain the target frame I t Depth D of t And a target frame I t And source frame I s Position and posture T t→s =[R t→s |t t→s ]. For corresponding pixels p between the target frame and the source frame t And p s If the two points correspond to the same 3D point, the following conditions should be satisfied:
D s K -1 p s =D t K -1 p t
wherein K is an internal reference of the camera. Since monocular depth has scale ambiguity, the transformation can be used for spatial transformation as follows:
p s ~KT t→s D t K -1 p t
in space geometric transformation KT t→s K -1 Defined as a basic matrix F for the correspondence of the pixels between frames. The relationship can then be used to construct a reconstructed frame
Figure RE-GDA0003735103710000051
From the target frame and the reconstructed frame, a luminosity loss pe can be constructed, which consists of an L1 error and a Structural Similarity (SSIM) error, as follows:
Figure RE-GDA0003735103710000052
α is a hyper-parameter, Monodepth2 is set to 0.85
The automatic masking of Monodepth2 is mainly to solve the inaccurate luminosity loss caused by moving pixels in the image that violate the luminosity consistency. The main idea is to find out the pixels with unchanged luminosity which are filtered from one frame to another frame in the training process, and the generated binarization mask mu is as follows:
Figure RE-GDA0003735103710000053
[]is Iversonbrack, used to generate a binary mask. I is t Is a target frame, I s Is a frame of a source of the video,
Figure RE-GDA0003735103710000054
is the reconstructed frame resulting from the spatial transformation.
For the luminosity loss generated by spatial transformation, the luminosity loss is obtained based on the spatial transformation model corresponding to the depth perception pixel in the embodiment. As shown in fig. 4, the details are as follows:
during the spatial transformation, a sufficiently distant region can be regarded as a plane of infinity, and the plane satisfies:
n T P+D=0
wherein n is a normal vector of the plane, P is a three-dimensional point on the plane, D is a depth of the point, and the n is obtained by transformation:
Figure RE-GDA0003735103710000055
bringing it into a spatially transformed relationship yields:
Figure RE-GDA0003735103710000061
Figure RE-GDA0003735103710000062
when D is t At infinity, i.e. for the infinity plane:
p s ~KR t→s D t K -1 p t
KR t→s K -1 defined as the homography matrix H ∞ at infinity, the reconstruction map is constructed by spatially transforming only the rotation matrix for the distant region
Figure RE-GDA0003735103710000063
For the purpose of differentiation, the reconstruction map obtained using the basis matrix is represented as
Figure RE-GDA0003735103710000064
Since the depth estimated by monocular scale estimation has scale ambiguity, the correspondence between two kinds of pixels cannot be selected directly through the predicted depth. Therefore, the embodiment designs a method for adaptive selection, specifically, two luminosity error maps are solved through a corresponding relationship between two pixels, and then a minimum value is selected pixel by pixel, that is, the final luminosity error is:
Figure RE-GDA0003735103710000065
for the omnidirectional automatic mask, the embodiment directly inputs the image sequence into the module, and after obtaining the mask result, the mask result is applied to the photometric error to shield the unreliable part, as shown in fig. 5, specifically as follows:
the embodiment introduces a Monodepth2 pre-training network to predict the initial of the target frameDepth D init And initial frame pose T init Further generating an initial reconstruction map I init . Because the depth and the pose are accurate, the luminosity error of the region which accords with the luminosity consistency is small, but the potential of the region which does not accord with the luminosity consistency is small.
According to the method, interference items are added to the initial pose, a plurality of interfered poses are introduced, and a plurality of assumed reconstruction frames are obtained after space transformation is utilized. Using these reconstructed frames I i Wherein, i ∈ {1,2, … }, in combination with the luminance of the target frame, a plurality of luminance error maps can be generated, and a plurality of binary masks can be obtained by using the magnitudes of the luminance error values, corresponding to the pixels of the moving object in each direction, as follows:
M i =[pe(I t ,I init ),pe(I t ,I i )]
in order to capture the object moving in each direction, the generated masks are minimized to obtain the final mask, namely:
M oA =min(M 1 ,M 2 ,…)
in the implementation process of the embodiment, only the translation vector is disturbed, and the specific translation disturbance item t i : t 1 =[t max ,0,0]、t 2 =[-t max ,0,0]、t 3 =[0,0,t max ]And t 4 =[0,0,-t max ]Wherein, t max Is the maximum value in the initialized translation vector.
It is not difficult to find that the invention adopts the depth perception pixel corresponding mode to excavate the pixel corresponding relation of the far area, improves the inaccurate pixel corresponding problem of the far area, and obtains an omnidirectional binary mask by utilizing the omnidirectional automatic mask mode to avoid the pixel of the moving object from participating in the calculation of the luminosity error. The invention improves the accuracy of luminosity loss by improving space transformation and generating the dynamic object automatic mask, thereby better supervising the learning of the deep network. Therefore, by applying the depth perception pixel mapping and omnidirectional automatic mask of the present embodiment to the framework of Monodepth2 in Godard et al, a monocular depth estimation result with higher accuracy can be obtained.
A second embodiment of the present invention relates to an apparatus for self-supervised monocular depth estimation in combination with spatiotemporal enhancement of photometric loss, comprising: the acquisition module is used for acquiring a plurality of adjacent frame images in the image sequence; the estimation module is used for inputting the image into a trained deep learning network to obtain depth information and pose information; luminosity loss information of the deep learning network is obtained based on a space transformation model of the depth perception pixel corresponding relation module, and pixels of a moving object are prevented from participating in luminosity error calculation by utilizing the omnidirectional automatic mask module.
The depth perception pixel correspondence module comprises: the first construction unit is used for carrying out spatial transformation on the far region by using the homography matrix and constructing a first reconstruction map; wherein the far zone treats the far zone as a plane of infinity; a second construction unit for performing spatial transformation using the basis matrix and constructing a second reconstruction map; and the luminosity loss information acquisition unit is used for solving a luminosity error map based on the first reconstruction map and a luminosity error map based on the second reconstruction map through the corresponding relation of two pixels, and then selecting the minimum value pixel by pixel to obtain the final luminosity loss information.
The omnidirectional automatic mask module includes: the initial reconstruction image generating unit is used for predicting the initial depth and the initial pose of the target frame through a pre-training network and generating an initial reconstruction image; a binarization mask generating unit, which is used for adding interference items to the initial pose and obtaining a plurality of assumed reconstruction frames by utilizing space transformation; generating a plurality of luminosity error maps by using the assumed reconstruction frame and combining the luminosity of the target frame, and obtaining a plurality of binarization masks by using the luminosity error maps; and the mask selecting unit is used for selecting the minimum value from the plurality of binary masks as a final mask. Wherein the interference term is a translational disturbance term, including: [ t ] of max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0,0, -t max ]Wherein, t max In translation vectors representing initialisationA maximum value.

Claims (8)

1. An auto-supervised monocular depth estimation method combined with spatio-temporal enhancement luminosity loss is characterized by comprising the following steps of:
acquiring a plurality of adjacent frame images in an image sequence;
and inputting the image into a trained deep learning network to obtain depth information and pose information, wherein luminosity loss information of the deep learning network is obtained based on a spatial transformation model of depth perception pixel correspondence, and pixels of a moving object are prevented from participating in calculation of luminosity errors by utilizing an omnidirectional automatic mask.
2. The method for self-supervised monocular depth estimation with spatio-temporal enhancement photometric loss according to claim 1, wherein the photometric loss information is derived based on a spatial transformation model of depth perception pixel correspondences specifically as:
carrying out spatial transformation on the far region by using a homography matrix, and constructing a first reconstruction diagram; wherein the far zone treats the far zone as a plane of infinity;
performing space transformation by using the basic matrix, and constructing a second reconstructed image;
and solving a luminosity error map based on the first reconstruction map and a luminosity error map based on the second reconstruction map through the corresponding relation of two pixels, and then selecting the minimum value pixel by pixel to obtain the final luminosity loss information.
3. The method for self-supervised monocular depth estimation with spatio-temporal enhancement luminosity loss according to claim 1, wherein the computation for avoiding the pixel participation luminosity error of the moving object by using the omnidirectional automatic mask is specifically as follows:
predicting the initial depth and the initial pose of a target frame through a pre-training network, and generating an initial reconstruction map;
adding interference items to the initial pose, and obtaining a plurality of assumed reconstruction frames by utilizing space transformation; generating a plurality of luminosity error maps by using the assumed reconstruction frame and combining the luminosity of the target frame, and obtaining a plurality of binarization masks by using the luminosity error maps;
and selecting the minimum value from the plurality of binarization masks as a final mask.
4. The method of claim 3, wherein the interference term is a translational perturbation term, and comprises: [ t ] of max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0,0, -t max ]Wherein, t max Representing the maximum value in the initialized translation vector.
5. An apparatus for self-supervised monocular depth estimation in conjunction with spatio-temporal enhancement of photometric loss, comprising:
the acquisition module is used for acquiring a plurality of adjacent frame images in the image sequence;
the estimation module is used for inputting the image into a trained deep learning network to obtain depth information and pose information;
luminosity loss information of the deep learning network is obtained based on a space transformation model of the depth perception pixel corresponding relation module, and pixels of a moving object are prevented from participating in luminosity error calculation by utilizing the omnidirectional automatic mask module.
6. The apparatus of claim 5, wherein the depth-aware pixel correspondence module comprises:
the first construction unit is used for carrying out spatial transformation on the far region by using the homography matrix and constructing a first reconstruction map;
wherein the distance area regards the distance area as a plane at infinity;
a second construction unit for performing spatial transformation using the basis matrix and constructing a second reconstruction map;
and the luminosity loss information acquisition unit is used for solving a luminosity error map based on the first reconstruction map and a luminosity error map based on the second reconstruction map through the corresponding relation of two pixels, and then selecting the minimum value pixel by pixel to obtain the final luminosity loss information.
7. The apparatus for self-supervised monocular depth estimation with spatio-temporal enhancement of photometric loss of claim 5, characterized in that the omnidirectional auto-mask module comprises:
the initial reconstruction image generation unit is used for predicting the initial depth and the initial pose of the target frame through a pre-training network and generating an initial reconstruction image;
a binarization mask generating unit, which is used for adding interference items to the initial pose and obtaining a plurality of assumed reconstruction frames by utilizing space transformation; generating a plurality of luminosity error maps by combining the luminosity of the target frame by using the assumed reconstruction frame, and obtaining a plurality of binarization masks by using the luminosity error maps
And the mask selecting unit is used for selecting the minimum value from the plurality of binary masks as a final mask.
8. The apparatus of claim 7, wherein the interference term is a translational perturbation term, comprising: [ t ] of max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0,0, -t max ]Wherein, t max Representing the maximum value in the initialized translation vector.
CN202210475411.0A 2022-04-29 2022-04-29 Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss Active CN114998411B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210475411.0A CN114998411B (en) 2022-04-29 2022-04-29 Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210475411.0A CN114998411B (en) 2022-04-29 2022-04-29 Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss

Publications (2)

Publication Number Publication Date
CN114998411A true CN114998411A (en) 2022-09-02
CN114998411B CN114998411B (en) 2024-01-09

Family

ID=83025390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210475411.0A Active CN114998411B (en) 2022-04-29 2022-04-29 Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss

Country Status (1)

Country Link
CN (1) CN114998411B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116245927A (en) * 2023-02-09 2023-06-09 湖北工业大学 ConvDepth-based self-supervision monocular depth estimation method and system

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110264509A (en) * 2018-04-27 2019-09-20 腾讯科技(深圳)有限公司 Determine the method, apparatus and its storage medium of the pose of image-capturing apparatus
US20190356905A1 (en) * 2018-05-17 2019-11-21 Niantic, Inc. Self-supervised training of a depth estimation system
CN111260680A (en) * 2020-01-13 2020-06-09 杭州电子科技大学 RGBD camera-based unsupervised pose estimation network construction method
US20200211206A1 (en) * 2018-12-27 2020-07-02 Baidu Usa Llc Joint learning of geometry and motion with three-dimensional holistic understanding
CN111369608A (en) * 2020-05-29 2020-07-03 南京晓庄学院 Visual odometer method based on image depth estimation
CN111739078A (en) * 2020-06-15 2020-10-02 大连理工大学 Monocular unsupervised depth estimation method based on context attention mechanism
CN111783582A (en) * 2020-06-22 2020-10-16 东南大学 Unsupervised monocular depth estimation algorithm based on deep learning
US20210118184A1 (en) * 2019-10-17 2021-04-22 Toyota Research Institute, Inc. Systems and methods for self-supervised scale-aware training of a model for monocular depth estimation
CN113160390A (en) * 2021-04-28 2021-07-23 北京理工大学 Three-dimensional dense reconstruction method and system
CN113240722A (en) * 2021-04-28 2021-08-10 浙江大学 Self-supervision depth estimation method based on multi-frame attention
CN113313732A (en) * 2021-06-25 2021-08-27 南京航空航天大学 Forward-looking scene depth estimation method based on self-supervision learning
CN113450410A (en) * 2021-06-29 2021-09-28 浙江大学 Monocular depth and pose joint estimation method based on epipolar geometry
CN113570658A (en) * 2021-06-10 2021-10-29 西安电子科技大学 Monocular video depth estimation method based on depth convolutional network
US20210398301A1 (en) * 2020-06-17 2021-12-23 Toyota Research Institute, Inc. Camera agnostic depth network
CN114022799A (en) * 2021-09-23 2022-02-08 中国人民解放军军事科学院国防科技创新研究院 Self-supervision monocular depth estimation method and device
CN114170286A (en) * 2021-11-04 2022-03-11 西安理工大学 Monocular depth estimation method based on unsupervised depth learning

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110264509A (en) * 2018-04-27 2019-09-20 腾讯科技(深圳)有限公司 Determine the method, apparatus and its storage medium of the pose of image-capturing apparatus
US20190356905A1 (en) * 2018-05-17 2019-11-21 Niantic, Inc. Self-supervised training of a depth estimation system
US20200211206A1 (en) * 2018-12-27 2020-07-02 Baidu Usa Llc Joint learning of geometry and motion with three-dimensional holistic understanding
US20210118184A1 (en) * 2019-10-17 2021-04-22 Toyota Research Institute, Inc. Systems and methods for self-supervised scale-aware training of a model for monocular depth estimation
CN111260680A (en) * 2020-01-13 2020-06-09 杭州电子科技大学 RGBD camera-based unsupervised pose estimation network construction method
CN111369608A (en) * 2020-05-29 2020-07-03 南京晓庄学院 Visual odometer method based on image depth estimation
US20210390723A1 (en) * 2020-06-15 2021-12-16 Dalian University Of Technology Monocular unsupervised depth estimation method based on contextual attention mechanism
CN111739078A (en) * 2020-06-15 2020-10-02 大连理工大学 Monocular unsupervised depth estimation method based on context attention mechanism
US20210398301A1 (en) * 2020-06-17 2021-12-23 Toyota Research Institute, Inc. Camera agnostic depth network
CN111783582A (en) * 2020-06-22 2020-10-16 东南大学 Unsupervised monocular depth estimation algorithm based on deep learning
CN113160390A (en) * 2021-04-28 2021-07-23 北京理工大学 Three-dimensional dense reconstruction method and system
CN113240722A (en) * 2021-04-28 2021-08-10 浙江大学 Self-supervision depth estimation method based on multi-frame attention
CN113570658A (en) * 2021-06-10 2021-10-29 西安电子科技大学 Monocular video depth estimation method based on depth convolutional network
CN113313732A (en) * 2021-06-25 2021-08-27 南京航空航天大学 Forward-looking scene depth estimation method based on self-supervision learning
CN113450410A (en) * 2021-06-29 2021-09-28 浙江大学 Monocular depth and pose joint estimation method based on epipolar geometry
CN114022799A (en) * 2021-09-23 2022-02-08 中国人民解放军军事科学院国防科技创新研究院 Self-supervision monocular depth estimation method and device
CN114170286A (en) * 2021-11-04 2022-03-11 西安理工大学 Monocular depth estimation method based on unsupervised depth learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
T.ZHOU 等: "Unsupervised learning of depth and ego-motion from video", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION(CPVR)》, pages 1851 - 1858 *
姜昊辰 等: "基于语义先验和深度约束的室内动态场景RGB-D SLAM算法", 《信息与控制》, vol. 50, no. 2021, pages 275 - 286 *
岑仕杰 等: "结合注意力与无监督深度学习的单目深度估计", 《广东工业大学学报》, no. 04, pages 35 - 41 *
胡智程: "基于无监督学习的单目图像深度估计", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 2021, pages 138 - 615 *
詹雁: "基于域适应的图像深度信息估计方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 2021, pages 138 - 811 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116245927A (en) * 2023-02-09 2023-06-09 湖北工业大学 ConvDepth-based self-supervision monocular depth estimation method and system
CN116245927B (en) * 2023-02-09 2024-01-16 湖北工业大学 ConvDepth-based self-supervision monocular depth estimation method and system

Also Published As

Publication number Publication date
CN114998411B (en) 2024-01-09

Similar Documents

Publication Publication Date Title
CN111325794B (en) Visual simultaneous localization and map construction method based on depth convolution self-encoder
WO2020046066A1 (en) Method for training convolutional neural network to reconstruct an image and system for depth map generation from an image
CN112505065B (en) Method for detecting surface defects of large part by indoor unmanned aerial vehicle
CN109815847B (en) Visual SLAM method based on semantic constraint
Zuo et al. Devo: Depth-event camera visual odometry in challenging conditions
CN111783582A (en) Unsupervised monocular depth estimation algorithm based on deep learning
US8867826B2 (en) Disparity estimation for misaligned stereo image pairs
CN111462210A (en) Monocular line feature map construction method based on epipolar constraint
Shreyas et al. 3D object detection and tracking methods using deep learning for computer vision applications
Zhong et al. WF-SLAM: A robust VSLAM for dynamic scenarios via weighted features
CN114332394A (en) Semantic information assistance-based dynamic scene three-dimensional reconstruction method
CN112686952A (en) Image optical flow computing system, method and application
CN116452752A (en) Intestinal wall reconstruction method combining monocular dense SLAM and residual error network
CN114998411A (en) Self-supervision monocular depth estimation method and device combined with space-time enhanced luminosity loss
Yang et al. SAM-Net: Semantic probabilistic and attention mechanisms of dynamic objects for self-supervised depth and camera pose estimation in visual odometry applications
Zhang et al. Depth map prediction from a single image with generative adversarial nets
Bhutani et al. Unsupervised Depth and Confidence Prediction from Monocular Images using Bayesian Inference
CN117011660A (en) Dot line feature SLAM method for fusing depth information in low-texture scene
Buck et al. Capturing uncertainty in monocular depth estimation: Towards fuzzy voxel maps
Wirges et al. Self-supervised flow estimation using geometric regularization with applications to camera image and grid map sequences
CN114935316B (en) Standard depth image generation method based on optical tracking and monocular vision
CN112308893B (en) Monocular depth estimation method based on iterative search strategy
Liu et al. Binocular depth estimation using convolutional neural network with Siamese branches
Taguchi et al. Unsupervised Simultaneous Learning for Camera Re-Localization and Depth Estimation from Video
Liu et al. Stereo Visual Odometry with Information Enhancement at Feature Points

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant