CN114998411B - Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss - Google Patents

Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss Download PDF

Info

Publication number
CN114998411B
CN114998411B CN202210475411.0A CN202210475411A CN114998411B CN 114998411 B CN114998411 B CN 114998411B CN 202210475411 A CN202210475411 A CN 202210475411A CN 114998411 B CN114998411 B CN 114998411B
Authority
CN
China
Prior art keywords
luminosity
depth
max
reconstruction
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210475411.0A
Other languages
Chinese (zh)
Other versions
CN114998411A (en
Inventor
李嘉茂
张天宇
朱冬晨
张广慧
石文君
刘衍青
张晓林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Institute of Microsystem and Information Technology of CAS
Original Assignee
Shanghai Institute of Microsystem and Information Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Institute of Microsystem and Information Technology of CAS filed Critical Shanghai Institute of Microsystem and Information Technology of CAS
Priority to CN202210475411.0A priority Critical patent/CN114998411B/en
Publication of CN114998411A publication Critical patent/CN114998411A/en
Application granted granted Critical
Publication of CN114998411B publication Critical patent/CN114998411B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a self-supervision monocular depth estimation method and a device combining space-time enhancement luminosity loss, wherein the method comprises the following steps: acquiring a plurality of adjacent frame images in an image sequence; and inputting the image into a trained deep learning network to obtain depth information and pose information, wherein luminosity loss information of the deep learning network is obtained based on a spatial transformation model of a depth perception pixel corresponding relation, and pixels of a moving object are prevented from participating in luminosity error calculation by using an omnidirectional automatic mask. The invention can improve the accuracy of luminosity loss, and further better supervise the learning of the deep network.

Description

Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss
Technical Field
The invention relates to the technical field of computer vision, in particular to a self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss.
Background
Estimating depth information of a scene from an image, i.e. image depth estimation, is a fundamental and very important task in computer vision today. The good image depth estimation algorithm can be applied to the fields of outdoor driving scenes, indoor small robots and the like, and has huge application value. And in the working process of the robot or the automatic driving automobile, the depth estimation algorithm is utilized to obtain scene depth information to assist the robot in carrying out path planning or obstacle avoidance of next movement.
Depth estimation of images is used to divide the method into supervised and self-supervised methods. The supervised method mainly utilizes a neural network to establish mapping between the image and the depth map, and training is carried out under the supervision of a true value, so that the network gradually has the capability of fitting the depth. However, the supervised method has become the mainstream in recent years due to the high price of the true value of the supervised method. Compared with the method requiring binocular image pair for training, the method based on the sequence image is widely focused by researchers due to the wider application range.
The self-supervision monocular depth framework based on the sequence image mainly comprises a depth estimation network and a pose estimation network, wherein the depth of a target frame and the pose transformation of the target frame and a source frame are respectively predicted. In combination with the estimated depth and pose, the source frame can be transformed into the coordinate system of the target frame to obtain a reconstructed image, and the two networks can be supervised for simultaneous training by utilizing the difference in luminosity between the target frame and the reconstructed image, namely luminosity loss. As the luminosity loss decreases, the depth of the network estimate becomes increasingly accurate.
The luminosity loss is generated by adopting a space transformation model, and the existing space transformation model accords with a theoretical rigid transformation method, but a certain depth estimation error is brought by the error of a translation vector in the pose in the calculation process, namely, the larger the depth is, the larger the error of depth estimation is. In addition, in order to solve the problem of inaccurate luminosity loss caused by moving pixels which violate luminosity consistency in images, the main idea of the existing mode is to find a binary mask generated by filtering pixels with unchanged luminosity from one frame to the other in the training process, but the binary mask can only distinguish objects with the same movement direction as a camera.
Disclosure of Invention
The inventors of the present invention found that the larger the depth, the larger the error in depth estimation is, for the following reasons: the purpose of the spatial transformation is to cause the corresponding pixels in the target frame and the source frame to coincide in the pixel plane after the spatial transformation, provided that a near point P is used N To solve for the corresponding pixel p t And p s As shown in fig. 1. The principle of self-supervised depth estimation is by minimizing p t And p s To make the estimated pose and depth more accurate. For the near zone, as shown in FIG. 1, in the case of a certain number of points, only when p t And transformed point p F When the pose is overlapped, the estimated pose can be more accurate, and the depth performance is better. For the remote area, as shown in FIG. 2, only the predicted rotation matrix is accurate to ensure p t And p s And thus if the photometric error is constructed using the estimated rotation matrix and translation vector without distinguishing between near and far, the photometric error uncertainty increases greatly, resulting in a deterioration of the result of depth estimation.
The invention aims to solve the technical problem of providing a self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss, which can improve the accuracy of luminosity loss and further better supervise the study of a depth network.
The technical scheme adopted for solving the technical problems is as follows: the self-supervision monocular depth estimation method combining space-time enhancement luminosity loss is provided, and comprises the following steps:
acquiring a plurality of adjacent frame images in an image sequence;
and inputting the image into a trained deep learning network to obtain depth information and pose information, wherein luminosity loss information of the deep learning network is obtained based on a spatial transformation model of a depth perception pixel corresponding relation, and pixels of a moving object are prevented from participating in luminosity error calculation by using an omnidirectional automatic mask.
The luminosity loss information is obtained specifically based on a space transformation model corresponding to the depth perception pixel:
carrying out space transformation on a remote area by utilizing a homography matrix, and constructing a first reconstruction map; wherein the remote area treats the remote area as an infinitely distant plane;
performing space transformation by utilizing the basic matrix, and constructing a second reconstruction map;
and solving a luminosity error graph based on the first reconstruction graph and a luminosity error graph based on the second reconstruction graph through the corresponding relation of the two pixels, and then selecting the minimum value pixel by pixel to obtain final luminosity loss information.
The calculation for avoiding the pixel participation photometric error of the moving object by using the omnidirectional automatic mask specifically comprises the following steps:
predicting the initial depth and the initial pose of a target frame through a pre-training network, and generating an initial reconstruction map;
adding an interference item to the initial pose, and obtaining a plurality of assumed reconstructed frames by utilizing space transformation; generating a plurality of luminosity error maps by using the hypothesized reconstructed frame and combining luminosity of the target frame, and obtaining a plurality of binarization masks by using the plurality of luminosity error maps;
and selecting a minimum value from the plurality of binarized masks as a final mask.
The interference term is a translational interference term, including: [ t ] max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0, -t max ]Wherein t is max Representing the maximum value in the initialized translation vector.
The technical scheme adopted for solving the technical problems is as follows: there is provided a self-supervising monocular depth estimation device incorporating spatio-temporal enhancement luminosity loss, comprising:
the acquisition module is used for acquiring a plurality of adjacent frame images in the image sequence;
the estimation module is used for inputting the image into a trained deep learning network to obtain depth information and pose information; the luminosity loss information of the deep learning network is obtained based on a space transformation model of the depth perception pixel corresponding relation module, and the pixels of the moving object are prevented from participating in luminosity error calculation by using the omnidirectional automatic mask module.
The depth perception pixel correspondence module comprises:
the first construction unit is used for carrying out space transformation on the remote area by utilizing a homography matrix and constructing a first reconstruction map; wherein the remote area treats the remote area as an infinitely distant plane;
a second construction unit for performing spatial transformation using the base matrix and constructing a second reconstruction map;
and the luminosity loss information acquisition unit is used for solving a luminosity error graph based on the first reconstruction graph and a luminosity error graph based on the second reconstruction graph through the corresponding relation of the two pixels, and then selecting the minimum value pixel by pixel to obtain final luminosity loss information.
The omnidirectional automatic mask module comprises:
the initial reconstruction map generating unit is used for predicting the initial depth and the initial pose of the target frame through the pre-training network and generating an initial reconstruction map;
the binarization mask generating unit is used for adding an interference item to the initial pose and obtaining a plurality of assumed reconstruction frames by utilizing space transformation; generating a plurality of luminosity error maps by using the hypothesized reconstructed frame and combining luminosity of the target frame, and obtaining a plurality of binarization masks by using the plurality of luminosity error maps;
and the mask selecting unit is used for selecting the minimum value from the plurality of binarized masks as a final mask.
The interference term is a translational interference term, including: [ t ] max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0, -t max ]Wherein t is max Representing the maximum value in the initialized translation vector.
Advantageous effects
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the invention adopts a depth perception pixel corresponding mode to excavate the pixel corresponding relation of the far area, improves the problem of inaccurate pixel correspondence of the far area, and obtains an omnidirectional binarization mask by utilizing an omnidirectional automatic mask mode to avoid the pixels of the moving object from participating in calculation of luminosity errors. The invention improves the accuracy of luminosity loss by improving the space transformation and generating the automatic mask of the dynamic object, thereby better supervising the study of the deep network.
Drawings
FIG. 1 is a schematic diagram of a near point pose solution;
FIG. 2 is a schematic diagram of remote point pose solving;
FIG. 3 is a schematic diagram of a Monodepth2 basic framework;
FIG. 4 is a schematic diagram of the generation of photometric losses in a first embodiment of the present invention;
fig. 5 is a schematic diagram of an omni-directional automatic mask in a first embodiment of the present invention.
Detailed Description
The invention will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. Further, it is understood that various changes and modifications may be made by those skilled in the art after reading the teachings of the present invention, and such equivalents are intended to fall within the scope of the claims appended hereto.
A first embodiment of the present invention relates to a self-supervising monocular depth estimation method combining spatio-temporal enhancement luminosity losses, comprising the steps of: acquiring a plurality of adjacent frame images in an image sequence; and inputting the image into a trained deep learning network to obtain depth information and pose information, wherein luminosity loss information of the deep learning network is obtained based on a space transformation model corresponding to depth perception pixels, and pixels of a moving object are prevented from participating in luminosity error calculation by using an omnidirectional automatic mask.
The method of this embodiment can be directly used in general self-monitoring monocular depth estimation, and any work using SfMLearner as a framework for realizing the principle can use the method of this embodiment. The space transformation model corresponding to the depth perception pixel is adopted by the space transformation part in the original frame, and the omnidirectional automatic mask is adopted by the automatic mask part.
The invention is further illustrated below by taking the Monodepth2 infrastructure of Godard et al.
For easier understanding, the overall framework of the Monodepth2 will be described first, as shown in fig. 3, and its input is the RGB images of three adjacent frames in the sequence; the output is the depth of the target frame, and the pose transformation between the target frame and the source frame.
The basic frame of the present embodiment is the same as that of fig. 3. Since the improved method of generating photometric loss and automatic masking portions by spatial transformation in this embodiment is mainly implemented, these two portions of Monodepth2 are first described with emphasis:
monodepth2 uses the same spatial transform model as SfMLearner, based on the target frame I t Depth D of (2) t And target frame I t And source frame I s Pose T of (2) t→s =[R t→s |t t→s ]. For the corresponding pixel p between the target frame and the source frame t And p s If it corresponds to the same 3D point, it should satisfy:
D s K -1 p s =D t K -1 p t
where K is an internal reference of the camera. Since the monocular depth has scale ambiguity, it can be transformed into the following formula for spatial transformation:
p s ~KT t→s D t K -1 p t
the KT is in space geometry transformation t→s K -1 Defined as a base matrix F for inter-frame imagingCorrespondence of the elements. And can then use the relationship to construct reconstructed frames
From the target frame and reconstructed frame, a luminosity loss pe can be constructed, consisting of an L1 error and a Structural Similarity (SSIM) error, as follows:
alpha is a super parameter, monodepth2 is set to 0.85
The automatic mask of Monodepth2 is mainly used for solving the problem of inaccurate luminosity loss caused by moving pixels which violate luminosity consistency in images. The main idea is to find out pixels whose luminosity is not diminished from one frame to another in the training process, and the generated binarization mask mu is as follows:
[]is iversonblack, used to generate the binary mask. I t Is the target frame, I s Is the source frame and is a frame of the source,is the reconstructed frame resulting from the spatial transformation.
For space transformation to generate luminosity loss, the present embodiment obtains luminosity loss amount based on a space transformation model corresponding to depth perception pixels. As shown in fig. 4, the specific steps are as follows:
in the spatial transformation process, a sufficiently far region can be considered as an infinitely far plane, and the plane satisfies:
n T P+D=0
wherein n is the normal vector of the plane, P is a three-dimensional point on the plane, D is the depth of the point, and the three-dimensional point is obtained through transformation:
bringing it into the relationship of the spatial transformation can be:
when D is t At infinity, i.e., for an infinity plane:
p s ~KR t→s D t K -1 p t
KR t→s K -1 the homography matrix is defined as an infinitely distant homography matrix H infinity, so that a reconstruction map is constructed by performing spatial transformation only by using a rotation matrix for a distant regionFor the purpose of distinguishing, a reconstructed graph obtained by using the basis matrix is expressed asBecause the depth estimated by monocular scale estimation has scale ambiguity, two pixel correspondence cannot be directly selected by the predicted depth. Therefore, the present embodiment designs a self-adaptive selection method, specifically, two photometric error maps are solved through two pixel correspondence relations, and then a minimum value is selected pixel by pixel, namely, the final photometric error is:
for the omnidirectional automatic mask, in this embodiment, the image sequence is directly input into the module, and after the mask result is obtained, the mask result is applied to the photometric error to block out the unreliable part, as shown in fig. 5, specifically as follows:
in the embodiment, a pre-training network of Monodepth2 is introduced to predict the initial depth D of the target frame init And initial frame pose T init Further generating an initial reconstructed graph I init . Because the depth and pose are already accurate, the photometric errors of the regions that are consistent with photometric have been smaller, but the regions that are not consistent with photometric have the potential to be smaller.
Aiming at the thought, by adding the interference items to the initial pose, some interfered poses are introduced, and some assumed reconstructed frames are obtained after spatial transformation. Using these reconstructed frames I i Wherein i epsilon {1,2, … }, a plurality of luminosity error maps can be generated by combining luminosity of the target frame, a plurality of binarization masks can be obtained by utilizing the luminosity error values, and the pixels of the moving object corresponding to all directions are as follows:
M i =[pe(I t ,I init ),pe(I t ,I i )]
in order to capture objects moving in all directions, the generated masks take minimum values to obtain a final mask, namely:
M oA =min(M 1 ,M 2 ,…)
in the implementation process of the embodiment, only the translation vector is disturbed, and a specific translation disturbance term t is obtained i :t 1 =[t max ,0,0]、t 2 =[-t max ,0,0]、t 3 =[0,0,t max ]And t 4 =[0,0,-t max ]Wherein t is max Is the maximum value in the initialized translation vector.
It is not difficult to find that the invention adopts a depth perception pixel corresponding mode to excavate the pixel corresponding relation of the far-distance area, thereby improving the problem of inaccurate pixel correspondence of the far-distance area, and an omnidirectional automatic mask mode is utilized to obtain an omnidirectional binarization mask for avoiding the pixels of the moving object from participating in calculation of luminosity errors. The invention improves the accuracy of luminosity loss by improving the space transformation and generating the automatic mask of the dynamic object, thereby better supervising the study of the deep network. Therefore, the depth perception pixel correspondence and the omnidirectional automatic mask in the embodiment are applied to the Monodepth2 frame of Godard et al, and a monocular depth estimation result with higher precision can be obtained.
A second embodiment of the present invention is directed to a self-supervising monocular depth estimation device incorporating spatio-temporal enhancement luminosity loss, comprising: the acquisition module is used for acquiring a plurality of adjacent frame images in the image sequence; the estimation module is used for inputting the image into a trained deep learning network to obtain depth information and pose information; the luminosity loss information of the deep learning network is obtained based on a space transformation model of the depth perception pixel corresponding relation module, and the pixels of the moving object are prevented from participating in luminosity error calculation by using the omnidirectional automatic mask module.
The depth perception pixel correspondence module comprises: the first construction unit is used for carrying out space transformation on the remote area by utilizing a homography matrix and constructing a first reconstruction map; wherein the remote area treats the remote area as an infinitely distant plane; a second construction unit for performing spatial transformation using the base matrix and constructing a second reconstruction map; and the luminosity loss information acquisition unit is used for solving a luminosity error graph based on the first reconstruction graph and a luminosity error graph based on the second reconstruction graph through the corresponding relation of the two pixels, and then selecting the minimum value pixel by pixel to obtain final luminosity loss information.
The omnidirectional automatic mask module comprises: the initial reconstruction map generating unit is used for predicting the initial depth and the initial pose of the target frame through the pre-training network and generating an initial reconstruction map; the binarization mask generating unit is used for adding an interference item to the initial pose and obtaining a plurality of assumed reconstruction frames by utilizing space transformation; generating a plurality of luminosity error maps by using the hypothesized reconstructed frame and combining luminosity of the target frame, and obtaining a plurality of binarization masks by using the plurality of luminosity error maps; and the mask selecting unit is used for selecting the minimum value from the plurality of binarized masks as a final mask. Wherein the interference term is a translational disturbance term, including: [ t ] max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0, -t max ]Wherein t is max Representing the maximum value in the initialized translation vector.

Claims (6)

1. A method for self-supervising monocular depth estimation in combination with space-time enhancement luminosity loss, comprising the steps of:
acquiring a plurality of adjacent frame images in an image sequence;
inputting the image into a trained deep learning network to obtain depth information and pose information, wherein luminosity loss information of the deep learning network is obtained based on a space transformation model of a depth perception pixel corresponding relation, and pixels of a moving object are prevented from participating in luminosity error calculation by using an omnidirectional automatic mask; the luminosity loss information is obtained based on a space transformation model of depth perception pixel corresponding relation specifically as follows:
carrying out space transformation on a remote area by utilizing a homography matrix, and constructing a first reconstruction map; wherein the remote area treats the remote area as an infinitely distant plane;
performing space transformation by utilizing the basic matrix, and constructing a second reconstruction map;
and solving a luminosity error graph based on the first reconstruction graph and a luminosity error graph based on the second reconstruction graph through the corresponding relation of the two pixels, and then selecting the minimum value pixel by pixel to obtain final luminosity loss information.
2. The method for self-supervising monocular depth estimation combined with space-time enhancement luminosity loss of claim 1 wherein the calculation of avoiding pixel participation luminosity errors of a moving object by using an omnidirectional automatic mask is specifically:
predicting the initial depth and the initial pose of a target frame through a pre-training network, and generating an initial reconstruction map;
adding an interference item to the initial pose, and obtaining a plurality of assumed reconstructed frames by utilizing space transformation; generating a plurality of luminosity error maps by using the hypothesized reconstructed frame and combining luminosity of the target frame, and obtaining a plurality of binarization masks by using the plurality of luminosity error maps;
and selecting a minimum value from the plurality of binarized masks as a final mask.
3. The method of self-supervising monocular depth estimation combining spatio-temporal enhancement luminosity losses of claim 2 wherein the interference term is a translational disturbance term comprising: [ t ] max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0, -t max ]Wherein t is max Representing the maximum value in the initialized translation vector.
4. A self-supervising monocular depth estimation device incorporating space-time enhancement luminosity loss, comprising:
the acquisition module is used for acquiring a plurality of adjacent frame images in the image sequence;
the estimation module is used for inputting the image into a trained deep learning network to obtain depth information and pose information; the luminosity loss information of the deep learning network is obtained based on a space transformation model of a depth perception pixel corresponding relation module, and pixels of a moving object are prevented from participating in luminosity error calculation by using an omnidirectional automatic mask module; the depth perception pixel correspondence module comprises:
the first construction unit is used for carrying out space transformation on the remote area by utilizing a homography matrix and constructing a first reconstruction map;
wherein the remote area treats the remote area as an infinitely distant plane;
a second construction unit for performing spatial transformation using the base matrix and constructing a second reconstruction map;
and the luminosity loss information acquisition unit is used for solving a luminosity error graph based on the first reconstruction graph and a luminosity error graph based on the second reconstruction graph through the corresponding relation of the two pixels, and then selecting the minimum value pixel by pixel to obtain final luminosity loss information.
5. The self-supervising monocular depth estimation apparatus incorporating spatio-temporal enhancement luminosity losses of claim 4 wherein the omnidirectional automatic masking module includes:
the initial reconstruction map generating unit is used for predicting the initial depth and the initial pose of the target frame through the pre-training network and generating an initial reconstruction map;
the binarization mask generating unit is used for adding an interference item to the initial pose and obtaining a plurality of assumed reconstruction frames by utilizing space transformation; generating a plurality of luminosity error maps by combining the luminosity of the target frame by using the hypothesized reconstructed frame, and obtaining a plurality of binarized masks by using the plurality of luminosity error maps
And the mask selecting unit is used for selecting the minimum value from the plurality of binarized masks as a final mask.
6. The self-supervising monocular depth estimation apparatus incorporating spatio-temporal enhancement luminosity losses of claim 5 wherein the interference term is a translational disturbance term comprising: [ t ] max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0, -t max ]Wherein t is max Representing the maximum value in the initialized translation vector.
CN202210475411.0A 2022-04-29 2022-04-29 Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss Active CN114998411B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210475411.0A CN114998411B (en) 2022-04-29 2022-04-29 Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210475411.0A CN114998411B (en) 2022-04-29 2022-04-29 Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss

Publications (2)

Publication Number Publication Date
CN114998411A CN114998411A (en) 2022-09-02
CN114998411B true CN114998411B (en) 2024-01-09

Family

ID=83025390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210475411.0A Active CN114998411B (en) 2022-04-29 2022-04-29 Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss

Country Status (1)

Country Link
CN (1) CN114998411B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116245927B (en) * 2023-02-09 2024-01-16 湖北工业大学 ConvDepth-based self-supervision monocular depth estimation method and system

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110264509A (en) * 2018-04-27 2019-09-20 腾讯科技(深圳)有限公司 Determine the method, apparatus and its storage medium of the pose of image-capturing apparatus
CN111260680A (en) * 2020-01-13 2020-06-09 杭州电子科技大学 RGBD camera-based unsupervised pose estimation network construction method
CN111369608A (en) * 2020-05-29 2020-07-03 南京晓庄学院 Visual odometer method based on image depth estimation
CN111739078A (en) * 2020-06-15 2020-10-02 大连理工大学 Monocular unsupervised depth estimation method based on context attention mechanism
CN111783582A (en) * 2020-06-22 2020-10-16 东南大学 Unsupervised monocular depth estimation algorithm based on deep learning
CN113160390A (en) * 2021-04-28 2021-07-23 北京理工大学 Three-dimensional dense reconstruction method and system
CN113240722A (en) * 2021-04-28 2021-08-10 浙江大学 Self-supervision depth estimation method based on multi-frame attention
CN113313732A (en) * 2021-06-25 2021-08-27 南京航空航天大学 Forward-looking scene depth estimation method based on self-supervision learning
CN113450410A (en) * 2021-06-29 2021-09-28 浙江大学 Monocular depth and pose joint estimation method based on epipolar geometry
CN113570658A (en) * 2021-06-10 2021-10-29 西安电子科技大学 Monocular video depth estimation method based on depth convolutional network
CN114022799A (en) * 2021-09-23 2022-02-08 中国人民解放军军事科学院国防科技创新研究院 Self-supervision monocular depth estimation method and device
CN114170286A (en) * 2021-11-04 2022-03-11 西安理工大学 Monocular depth estimation method based on unsupervised depth learning

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102506959B1 (en) * 2018-05-17 2023-03-07 나이앤틱, 인크. Self-supervised training of depth estimation systems
US10970856B2 (en) * 2018-12-27 2021-04-06 Baidu Usa Llc Joint learning of geometry and motion with three-dimensional holistic understanding
US11176709B2 (en) * 2019-10-17 2021-11-16 Toyota Research Institute, Inc. Systems and methods for self-supervised scale-aware training of a model for monocular depth estimation
US11257231B2 (en) * 2020-06-17 2022-02-22 Toyota Research Institute, Inc. Camera agnostic depth network

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110264509A (en) * 2018-04-27 2019-09-20 腾讯科技(深圳)有限公司 Determine the method, apparatus and its storage medium of the pose of image-capturing apparatus
CN111260680A (en) * 2020-01-13 2020-06-09 杭州电子科技大学 RGBD camera-based unsupervised pose estimation network construction method
CN111369608A (en) * 2020-05-29 2020-07-03 南京晓庄学院 Visual odometer method based on image depth estimation
CN111739078A (en) * 2020-06-15 2020-10-02 大连理工大学 Monocular unsupervised depth estimation method based on context attention mechanism
CN111783582A (en) * 2020-06-22 2020-10-16 东南大学 Unsupervised monocular depth estimation algorithm based on deep learning
CN113160390A (en) * 2021-04-28 2021-07-23 北京理工大学 Three-dimensional dense reconstruction method and system
CN113240722A (en) * 2021-04-28 2021-08-10 浙江大学 Self-supervision depth estimation method based on multi-frame attention
CN113570658A (en) * 2021-06-10 2021-10-29 西安电子科技大学 Monocular video depth estimation method based on depth convolutional network
CN113313732A (en) * 2021-06-25 2021-08-27 南京航空航天大学 Forward-looking scene depth estimation method based on self-supervision learning
CN113450410A (en) * 2021-06-29 2021-09-28 浙江大学 Monocular depth and pose joint estimation method based on epipolar geometry
CN114022799A (en) * 2021-09-23 2022-02-08 中国人民解放军军事科学院国防科技创新研究院 Self-supervision monocular depth estimation method and device
CN114170286A (en) * 2021-11-04 2022-03-11 西安理工大学 Monocular depth estimation method based on unsupervised depth learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Unsupervised learning of depth and ego-motion from video;T.Zhou 等;《2017 IEEE Conference on Computer Vision and Pattern Recognition(CPVR)》;1851-1858 *
基于域适应的图像深度信息估计方法研究;詹雁;《中国优秀硕士学位论文全文数据库信息科技辑》(第2021(04)期);I138-811 *
基于无监督学习的单目图像深度估计;胡智程;《中国优秀硕士学位论文全文数据库信息科技辑》(第2021(08)期);I138-615 *
基于语义先验和深度约束的室内动态场景RGB-D SLAM算法;姜昊辰 等;《信息与控制》;第50卷(第2021(03)期);275-286 *
结合注意力与无监督深度学习的单目深度估计;岑仕杰 等;《广东工业大学学报》(第04期);35-41 *

Also Published As

Publication number Publication date
CN114998411A (en) 2022-09-02

Similar Documents

Publication Publication Date Title
Mitrokhin et al. EV-IMO: Motion segmentation dataset and learning pipeline for event cameras
WO2020046066A1 (en) Method for training convolutional neural network to reconstruct an image and system for depth map generation from an image
Yang et al. Fusion of median and bilateral filtering for range image upsampling
CN109815847B (en) Visual SLAM method based on semantic constraint
US20140147031A1 (en) Disparity Estimation for Misaligned Stereo Image Pairs
CN113850900B (en) Method and system for recovering depth map based on image and geometric clues in three-dimensional reconstruction
Goncalves et al. Deepdive: An end-to-end dehazing method using deep learning
CN114998411B (en) Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss
Jeon et al. Struct-MDC: Mesh-refined unsupervised depth completion leveraging structural regularities from visual SLAM
Yang et al. SAM-Net: Semantic probabilistic and attention mechanisms of dynamic objects for self-supervised depth and camera pose estimation in visual odometry applications
CN113920270B (en) Layout reconstruction method and system based on multi-view panorama
Tian et al. Monocular depth estimation based on a single image: a literature review
Lu et al. Stereo disparity optimization with depth change constraint based on a continuous video
CN117876452A (en) Self-supervision depth estimation method and system based on moving object pose estimation
Zhang et al. Depth map prediction from a single image with generative adversarial nets
CN112308893A (en) Monocular depth estimation method based on iterative search strategy
CN112308917A (en) Vision-based mobile robot positioning method
Bhutani et al. Unsupervised Depth and Confidence Prediction from Monocular Images using Bayesian Inference
CN113160247B (en) Anti-noise twin network target tracking method based on frequency separation
CN115239559A (en) Depth map super-resolution method and system for fusion view synthesis
Liu et al. Binocular depth estimation using convolutional neural network with Siamese branches
Yuan et al. SceneFactory: A Workflow-centric and Unified Framework for Incremental Scene Modeling
Chowdhury et al. An efficient algorithm for stereo correspondence matching
Hu et al. Self-supervised monocular visual odometry based on cross-correlation
Fan et al. Deeper into Self-Supervised Monocular Indoor Depth Estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant