CN114998411B - Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss - Google Patents
Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss Download PDFInfo
- Publication number
- CN114998411B CN114998411B CN202210475411.0A CN202210475411A CN114998411B CN 114998411 B CN114998411 B CN 114998411B CN 202210475411 A CN202210475411 A CN 202210475411A CN 114998411 B CN114998411 B CN 114998411B
- Authority
- CN
- China
- Prior art keywords
- luminosity
- depth
- max
- reconstruction
- self
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000009466 transformation Effects 0.000 claims abstract description 45
- 230000008447 perception Effects 0.000 claims abstract description 17
- 238000013135 deep learning Methods 0.000 claims abstract description 14
- 238000004364 calculation method Methods 0.000 claims abstract description 12
- 239000011159 matrix material Substances 0.000 claims description 17
- 238000012549 training Methods 0.000 claims description 11
- 238000013519 translation Methods 0.000 claims description 10
- 238000010276 construction Methods 0.000 claims description 6
- 230000000873 masking effect Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 230000003292 diminished effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a self-supervision monocular depth estimation method and a device combining space-time enhancement luminosity loss, wherein the method comprises the following steps: acquiring a plurality of adjacent frame images in an image sequence; and inputting the image into a trained deep learning network to obtain depth information and pose information, wherein luminosity loss information of the deep learning network is obtained based on a spatial transformation model of a depth perception pixel corresponding relation, and pixels of a moving object are prevented from participating in luminosity error calculation by using an omnidirectional automatic mask. The invention can improve the accuracy of luminosity loss, and further better supervise the learning of the deep network.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss.
Background
Estimating depth information of a scene from an image, i.e. image depth estimation, is a fundamental and very important task in computer vision today. The good image depth estimation algorithm can be applied to the fields of outdoor driving scenes, indoor small robots and the like, and has huge application value. And in the working process of the robot or the automatic driving automobile, the depth estimation algorithm is utilized to obtain scene depth information to assist the robot in carrying out path planning or obstacle avoidance of next movement.
Depth estimation of images is used to divide the method into supervised and self-supervised methods. The supervised method mainly utilizes a neural network to establish mapping between the image and the depth map, and training is carried out under the supervision of a true value, so that the network gradually has the capability of fitting the depth. However, the supervised method has become the mainstream in recent years due to the high price of the true value of the supervised method. Compared with the method requiring binocular image pair for training, the method based on the sequence image is widely focused by researchers due to the wider application range.
The self-supervision monocular depth framework based on the sequence image mainly comprises a depth estimation network and a pose estimation network, wherein the depth of a target frame and the pose transformation of the target frame and a source frame are respectively predicted. In combination with the estimated depth and pose, the source frame can be transformed into the coordinate system of the target frame to obtain a reconstructed image, and the two networks can be supervised for simultaneous training by utilizing the difference in luminosity between the target frame and the reconstructed image, namely luminosity loss. As the luminosity loss decreases, the depth of the network estimate becomes increasingly accurate.
The luminosity loss is generated by adopting a space transformation model, and the existing space transformation model accords with a theoretical rigid transformation method, but a certain depth estimation error is brought by the error of a translation vector in the pose in the calculation process, namely, the larger the depth is, the larger the error of depth estimation is. In addition, in order to solve the problem of inaccurate luminosity loss caused by moving pixels which violate luminosity consistency in images, the main idea of the existing mode is to find a binary mask generated by filtering pixels with unchanged luminosity from one frame to the other in the training process, but the binary mask can only distinguish objects with the same movement direction as a camera.
Disclosure of Invention
The inventors of the present invention found that the larger the depth, the larger the error in depth estimation is, for the following reasons: the purpose of the spatial transformation is to cause the corresponding pixels in the target frame and the source frame to coincide in the pixel plane after the spatial transformation, provided that a near point P is used N To solve for the corresponding pixel p t And p s As shown in fig. 1. The principle of self-supervised depth estimation is by minimizing p t And p s To make the estimated pose and depth more accurate. For the near zone, as shown in FIG. 1, in the case of a certain number of points, only when p t And transformed point p F When the pose is overlapped, the estimated pose can be more accurate, and the depth performance is better. For the remote area, as shown in FIG. 2, only the predicted rotation matrix is accurate to ensure p t And p s And thus if the photometric error is constructed using the estimated rotation matrix and translation vector without distinguishing between near and far, the photometric error uncertainty increases greatly, resulting in a deterioration of the result of depth estimation.
The invention aims to solve the technical problem of providing a self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss, which can improve the accuracy of luminosity loss and further better supervise the study of a depth network.
The technical scheme adopted for solving the technical problems is as follows: the self-supervision monocular depth estimation method combining space-time enhancement luminosity loss is provided, and comprises the following steps:
acquiring a plurality of adjacent frame images in an image sequence;
and inputting the image into a trained deep learning network to obtain depth information and pose information, wherein luminosity loss information of the deep learning network is obtained based on a spatial transformation model of a depth perception pixel corresponding relation, and pixels of a moving object are prevented from participating in luminosity error calculation by using an omnidirectional automatic mask.
The luminosity loss information is obtained specifically based on a space transformation model corresponding to the depth perception pixel:
carrying out space transformation on a remote area by utilizing a homography matrix, and constructing a first reconstruction map; wherein the remote area treats the remote area as an infinitely distant plane;
performing space transformation by utilizing the basic matrix, and constructing a second reconstruction map;
and solving a luminosity error graph based on the first reconstruction graph and a luminosity error graph based on the second reconstruction graph through the corresponding relation of the two pixels, and then selecting the minimum value pixel by pixel to obtain final luminosity loss information.
The calculation for avoiding the pixel participation photometric error of the moving object by using the omnidirectional automatic mask specifically comprises the following steps:
predicting the initial depth and the initial pose of a target frame through a pre-training network, and generating an initial reconstruction map;
adding an interference item to the initial pose, and obtaining a plurality of assumed reconstructed frames by utilizing space transformation; generating a plurality of luminosity error maps by using the hypothesized reconstructed frame and combining luminosity of the target frame, and obtaining a plurality of binarization masks by using the plurality of luminosity error maps;
and selecting a minimum value from the plurality of binarized masks as a final mask.
The interference term is a translational interference term, including: [ t ] max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0, -t max ]Wherein t is max Representing the maximum value in the initialized translation vector.
The technical scheme adopted for solving the technical problems is as follows: there is provided a self-supervising monocular depth estimation device incorporating spatio-temporal enhancement luminosity loss, comprising:
the acquisition module is used for acquiring a plurality of adjacent frame images in the image sequence;
the estimation module is used for inputting the image into a trained deep learning network to obtain depth information and pose information; the luminosity loss information of the deep learning network is obtained based on a space transformation model of the depth perception pixel corresponding relation module, and the pixels of the moving object are prevented from participating in luminosity error calculation by using the omnidirectional automatic mask module.
The depth perception pixel correspondence module comprises:
the first construction unit is used for carrying out space transformation on the remote area by utilizing a homography matrix and constructing a first reconstruction map; wherein the remote area treats the remote area as an infinitely distant plane;
a second construction unit for performing spatial transformation using the base matrix and constructing a second reconstruction map;
and the luminosity loss information acquisition unit is used for solving a luminosity error graph based on the first reconstruction graph and a luminosity error graph based on the second reconstruction graph through the corresponding relation of the two pixels, and then selecting the minimum value pixel by pixel to obtain final luminosity loss information.
The omnidirectional automatic mask module comprises:
the initial reconstruction map generating unit is used for predicting the initial depth and the initial pose of the target frame through the pre-training network and generating an initial reconstruction map;
the binarization mask generating unit is used for adding an interference item to the initial pose and obtaining a plurality of assumed reconstruction frames by utilizing space transformation; generating a plurality of luminosity error maps by using the hypothesized reconstructed frame and combining luminosity of the target frame, and obtaining a plurality of binarization masks by using the plurality of luminosity error maps;
and the mask selecting unit is used for selecting the minimum value from the plurality of binarized masks as a final mask.
The interference term is a translational interference term, including: [ t ] max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0, -t max ]Wherein t is max Representing the maximum value in the initialized translation vector.
Advantageous effects
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the invention adopts a depth perception pixel corresponding mode to excavate the pixel corresponding relation of the far area, improves the problem of inaccurate pixel correspondence of the far area, and obtains an omnidirectional binarization mask by utilizing an omnidirectional automatic mask mode to avoid the pixels of the moving object from participating in calculation of luminosity errors. The invention improves the accuracy of luminosity loss by improving the space transformation and generating the automatic mask of the dynamic object, thereby better supervising the study of the deep network.
Drawings
FIG. 1 is a schematic diagram of a near point pose solution;
FIG. 2 is a schematic diagram of remote point pose solving;
FIG. 3 is a schematic diagram of a Monodepth2 basic framework;
FIG. 4 is a schematic diagram of the generation of photometric losses in a first embodiment of the present invention;
fig. 5 is a schematic diagram of an omni-directional automatic mask in a first embodiment of the present invention.
Detailed Description
The invention will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. Further, it is understood that various changes and modifications may be made by those skilled in the art after reading the teachings of the present invention, and such equivalents are intended to fall within the scope of the claims appended hereto.
A first embodiment of the present invention relates to a self-supervising monocular depth estimation method combining spatio-temporal enhancement luminosity losses, comprising the steps of: acquiring a plurality of adjacent frame images in an image sequence; and inputting the image into a trained deep learning network to obtain depth information and pose information, wherein luminosity loss information of the deep learning network is obtained based on a space transformation model corresponding to depth perception pixels, and pixels of a moving object are prevented from participating in luminosity error calculation by using an omnidirectional automatic mask.
The method of this embodiment can be directly used in general self-monitoring monocular depth estimation, and any work using SfMLearner as a framework for realizing the principle can use the method of this embodiment. The space transformation model corresponding to the depth perception pixel is adopted by the space transformation part in the original frame, and the omnidirectional automatic mask is adopted by the automatic mask part.
The invention is further illustrated below by taking the Monodepth2 infrastructure of Godard et al.
For easier understanding, the overall framework of the Monodepth2 will be described first, as shown in fig. 3, and its input is the RGB images of three adjacent frames in the sequence; the output is the depth of the target frame, and the pose transformation between the target frame and the source frame.
The basic frame of the present embodiment is the same as that of fig. 3. Since the improved method of generating photometric loss and automatic masking portions by spatial transformation in this embodiment is mainly implemented, these two portions of Monodepth2 are first described with emphasis:
monodepth2 uses the same spatial transform model as SfMLearner, based on the target frame I t Depth D of (2) t And target frame I t And source frame I s Pose T of (2) t→s =[R t→s |t t→s ]. For the corresponding pixel p between the target frame and the source frame t And p s If it corresponds to the same 3D point, it should satisfy:
D s K -1 p s =D t K -1 p t
where K is an internal reference of the camera. Since the monocular depth has scale ambiguity, it can be transformed into the following formula for spatial transformation:
p s ~KT t→s D t K -1 p t
the KT is in space geometry transformation t→s K -1 Defined as a base matrix F for inter-frame imagingCorrespondence of the elements. And can then use the relationship to construct reconstructed frames
From the target frame and reconstructed frame, a luminosity loss pe can be constructed, consisting of an L1 error and a Structural Similarity (SSIM) error, as follows:
alpha is a super parameter, monodepth2 is set to 0.85
The automatic mask of Monodepth2 is mainly used for solving the problem of inaccurate luminosity loss caused by moving pixels which violate luminosity consistency in images. The main idea is to find out pixels whose luminosity is not diminished from one frame to another in the training process, and the generated binarization mask mu is as follows:
[]is iversonblack, used to generate the binary mask. I t Is the target frame, I s Is the source frame and is a frame of the source,is the reconstructed frame resulting from the spatial transformation.
For space transformation to generate luminosity loss, the present embodiment obtains luminosity loss amount based on a space transformation model corresponding to depth perception pixels. As shown in fig. 4, the specific steps are as follows:
in the spatial transformation process, a sufficiently far region can be considered as an infinitely far plane, and the plane satisfies:
n T P+D=0
wherein n is the normal vector of the plane, P is a three-dimensional point on the plane, D is the depth of the point, and the three-dimensional point is obtained through transformation:
bringing it into the relationship of the spatial transformation can be:
when D is t At infinity, i.e., for an infinity plane:
p s ~KR t→s D t K -1 p t
KR t→s K -1 the homography matrix is defined as an infinitely distant homography matrix H infinity, so that a reconstruction map is constructed by performing spatial transformation only by using a rotation matrix for a distant regionFor the purpose of distinguishing, a reconstructed graph obtained by using the basis matrix is expressed asBecause the depth estimated by monocular scale estimation has scale ambiguity, two pixel correspondence cannot be directly selected by the predicted depth. Therefore, the present embodiment designs a self-adaptive selection method, specifically, two photometric error maps are solved through two pixel correspondence relations, and then a minimum value is selected pixel by pixel, namely, the final photometric error is:
for the omnidirectional automatic mask, in this embodiment, the image sequence is directly input into the module, and after the mask result is obtained, the mask result is applied to the photometric error to block out the unreliable part, as shown in fig. 5, specifically as follows:
in the embodiment, a pre-training network of Monodepth2 is introduced to predict the initial depth D of the target frame init And initial frame pose T init Further generating an initial reconstructed graph I init . Because the depth and pose are already accurate, the photometric errors of the regions that are consistent with photometric have been smaller, but the regions that are not consistent with photometric have the potential to be smaller.
Aiming at the thought, by adding the interference items to the initial pose, some interfered poses are introduced, and some assumed reconstructed frames are obtained after spatial transformation. Using these reconstructed frames I i Wherein i epsilon {1,2, … }, a plurality of luminosity error maps can be generated by combining luminosity of the target frame, a plurality of binarization masks can be obtained by utilizing the luminosity error values, and the pixels of the moving object corresponding to all directions are as follows:
M i =[pe(I t ,I init ),pe(I t ,I i )]
in order to capture objects moving in all directions, the generated masks take minimum values to obtain a final mask, namely:
M oA =min(M 1 ,M 2 ,…)
in the implementation process of the embodiment, only the translation vector is disturbed, and a specific translation disturbance term t is obtained i :t 1 =[t max ,0,0]、t 2 =[-t max ,0,0]、t 3 =[0,0,t max ]And t 4 =[0,0,-t max ]Wherein t is max Is the maximum value in the initialized translation vector.
It is not difficult to find that the invention adopts a depth perception pixel corresponding mode to excavate the pixel corresponding relation of the far-distance area, thereby improving the problem of inaccurate pixel correspondence of the far-distance area, and an omnidirectional automatic mask mode is utilized to obtain an omnidirectional binarization mask for avoiding the pixels of the moving object from participating in calculation of luminosity errors. The invention improves the accuracy of luminosity loss by improving the space transformation and generating the automatic mask of the dynamic object, thereby better supervising the study of the deep network. Therefore, the depth perception pixel correspondence and the omnidirectional automatic mask in the embodiment are applied to the Monodepth2 frame of Godard et al, and a monocular depth estimation result with higher precision can be obtained.
A second embodiment of the present invention is directed to a self-supervising monocular depth estimation device incorporating spatio-temporal enhancement luminosity loss, comprising: the acquisition module is used for acquiring a plurality of adjacent frame images in the image sequence; the estimation module is used for inputting the image into a trained deep learning network to obtain depth information and pose information; the luminosity loss information of the deep learning network is obtained based on a space transformation model of the depth perception pixel corresponding relation module, and the pixels of the moving object are prevented from participating in luminosity error calculation by using the omnidirectional automatic mask module.
The depth perception pixel correspondence module comprises: the first construction unit is used for carrying out space transformation on the remote area by utilizing a homography matrix and constructing a first reconstruction map; wherein the remote area treats the remote area as an infinitely distant plane; a second construction unit for performing spatial transformation using the base matrix and constructing a second reconstruction map; and the luminosity loss information acquisition unit is used for solving a luminosity error graph based on the first reconstruction graph and a luminosity error graph based on the second reconstruction graph through the corresponding relation of the two pixels, and then selecting the minimum value pixel by pixel to obtain final luminosity loss information.
The omnidirectional automatic mask module comprises: the initial reconstruction map generating unit is used for predicting the initial depth and the initial pose of the target frame through the pre-training network and generating an initial reconstruction map; the binarization mask generating unit is used for adding an interference item to the initial pose and obtaining a plurality of assumed reconstruction frames by utilizing space transformation; generating a plurality of luminosity error maps by using the hypothesized reconstructed frame and combining luminosity of the target frame, and obtaining a plurality of binarization masks by using the plurality of luminosity error maps; and the mask selecting unit is used for selecting the minimum value from the plurality of binarized masks as a final mask. Wherein the interference term is a translational disturbance term, including: [ t ] max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0, -t max ]Wherein t is max Representing the maximum value in the initialized translation vector.
Claims (6)
1. A method for self-supervising monocular depth estimation in combination with space-time enhancement luminosity loss, comprising the steps of:
acquiring a plurality of adjacent frame images in an image sequence;
inputting the image into a trained deep learning network to obtain depth information and pose information, wherein luminosity loss information of the deep learning network is obtained based on a space transformation model of a depth perception pixel corresponding relation, and pixels of a moving object are prevented from participating in luminosity error calculation by using an omnidirectional automatic mask; the luminosity loss information is obtained based on a space transformation model of depth perception pixel corresponding relation specifically as follows:
carrying out space transformation on a remote area by utilizing a homography matrix, and constructing a first reconstruction map; wherein the remote area treats the remote area as an infinitely distant plane;
performing space transformation by utilizing the basic matrix, and constructing a second reconstruction map;
and solving a luminosity error graph based on the first reconstruction graph and a luminosity error graph based on the second reconstruction graph through the corresponding relation of the two pixels, and then selecting the minimum value pixel by pixel to obtain final luminosity loss information.
2. The method for self-supervising monocular depth estimation combined with space-time enhancement luminosity loss of claim 1 wherein the calculation of avoiding pixel participation luminosity errors of a moving object by using an omnidirectional automatic mask is specifically:
predicting the initial depth and the initial pose of a target frame through a pre-training network, and generating an initial reconstruction map;
adding an interference item to the initial pose, and obtaining a plurality of assumed reconstructed frames by utilizing space transformation; generating a plurality of luminosity error maps by using the hypothesized reconstructed frame and combining luminosity of the target frame, and obtaining a plurality of binarization masks by using the plurality of luminosity error maps;
and selecting a minimum value from the plurality of binarized masks as a final mask.
3. The method of self-supervising monocular depth estimation combining spatio-temporal enhancement luminosity losses of claim 2 wherein the interference term is a translational disturbance term comprising: [ t ] max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0, -t max ]Wherein t is max Representing the maximum value in the initialized translation vector.
4. A self-supervising monocular depth estimation device incorporating space-time enhancement luminosity loss, comprising:
the acquisition module is used for acquiring a plurality of adjacent frame images in the image sequence;
the estimation module is used for inputting the image into a trained deep learning network to obtain depth information and pose information; the luminosity loss information of the deep learning network is obtained based on a space transformation model of a depth perception pixel corresponding relation module, and pixels of a moving object are prevented from participating in luminosity error calculation by using an omnidirectional automatic mask module; the depth perception pixel correspondence module comprises:
the first construction unit is used for carrying out space transformation on the remote area by utilizing a homography matrix and constructing a first reconstruction map;
wherein the remote area treats the remote area as an infinitely distant plane;
a second construction unit for performing spatial transformation using the base matrix and constructing a second reconstruction map;
and the luminosity loss information acquisition unit is used for solving a luminosity error graph based on the first reconstruction graph and a luminosity error graph based on the second reconstruction graph through the corresponding relation of the two pixels, and then selecting the minimum value pixel by pixel to obtain final luminosity loss information.
5. The self-supervising monocular depth estimation apparatus incorporating spatio-temporal enhancement luminosity losses of claim 4 wherein the omnidirectional automatic masking module includes:
the initial reconstruction map generating unit is used for predicting the initial depth and the initial pose of the target frame through the pre-training network and generating an initial reconstruction map;
the binarization mask generating unit is used for adding an interference item to the initial pose and obtaining a plurality of assumed reconstruction frames by utilizing space transformation; generating a plurality of luminosity error maps by combining the luminosity of the target frame by using the hypothesized reconstructed frame, and obtaining a plurality of binarized masks by using the plurality of luminosity error maps
And the mask selecting unit is used for selecting the minimum value from the plurality of binarized masks as a final mask.
6. The self-supervising monocular depth estimation apparatus incorporating spatio-temporal enhancement luminosity losses of claim 5 wherein the interference term is a translational disturbance term comprising: [ t ] max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0, -t max ]Wherein t is max Representing the maximum value in the initialized translation vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210475411.0A CN114998411B (en) | 2022-04-29 | 2022-04-29 | Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210475411.0A CN114998411B (en) | 2022-04-29 | 2022-04-29 | Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114998411A CN114998411A (en) | 2022-09-02 |
CN114998411B true CN114998411B (en) | 2024-01-09 |
Family
ID=83025390
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210475411.0A Active CN114998411B (en) | 2022-04-29 | 2022-04-29 | Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114998411B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116245927B (en) * | 2023-02-09 | 2024-01-16 | 湖北工业大学 | ConvDepth-based self-supervision monocular depth estimation method and system |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110264509A (en) * | 2018-04-27 | 2019-09-20 | 腾讯科技(深圳)有限公司 | Determine the method, apparatus and its storage medium of the pose of image-capturing apparatus |
CN111260680A (en) * | 2020-01-13 | 2020-06-09 | 杭州电子科技大学 | RGBD camera-based unsupervised pose estimation network construction method |
CN111369608A (en) * | 2020-05-29 | 2020-07-03 | 南京晓庄学院 | Visual odometer method based on image depth estimation |
CN111739078A (en) * | 2020-06-15 | 2020-10-02 | 大连理工大学 | Monocular unsupervised depth estimation method based on context attention mechanism |
CN111783582A (en) * | 2020-06-22 | 2020-10-16 | 东南大学 | Unsupervised monocular depth estimation algorithm based on deep learning |
CN113160390A (en) * | 2021-04-28 | 2021-07-23 | 北京理工大学 | Three-dimensional dense reconstruction method and system |
CN113240722A (en) * | 2021-04-28 | 2021-08-10 | 浙江大学 | Self-supervision depth estimation method based on multi-frame attention |
CN113313732A (en) * | 2021-06-25 | 2021-08-27 | 南京航空航天大学 | Forward-looking scene depth estimation method based on self-supervision learning |
CN113450410A (en) * | 2021-06-29 | 2021-09-28 | 浙江大学 | Monocular depth and pose joint estimation method based on epipolar geometry |
CN113570658A (en) * | 2021-06-10 | 2021-10-29 | 西安电子科技大学 | Monocular video depth estimation method based on depth convolutional network |
CN114022799A (en) * | 2021-09-23 | 2022-02-08 | 中国人民解放军军事科学院国防科技创新研究院 | Self-supervision monocular depth estimation method and device |
CN114170286A (en) * | 2021-11-04 | 2022-03-11 | 西安理工大学 | Monocular depth estimation method based on unsupervised depth learning |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102506959B1 (en) * | 2018-05-17 | 2023-03-07 | 나이앤틱, 인크. | Self-supervised training of depth estimation systems |
US10970856B2 (en) * | 2018-12-27 | 2021-04-06 | Baidu Usa Llc | Joint learning of geometry and motion with three-dimensional holistic understanding |
US11176709B2 (en) * | 2019-10-17 | 2021-11-16 | Toyota Research Institute, Inc. | Systems and methods for self-supervised scale-aware training of a model for monocular depth estimation |
US11257231B2 (en) * | 2020-06-17 | 2022-02-22 | Toyota Research Institute, Inc. | Camera agnostic depth network |
-
2022
- 2022-04-29 CN CN202210475411.0A patent/CN114998411B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110264509A (en) * | 2018-04-27 | 2019-09-20 | 腾讯科技(深圳)有限公司 | Determine the method, apparatus and its storage medium of the pose of image-capturing apparatus |
CN111260680A (en) * | 2020-01-13 | 2020-06-09 | 杭州电子科技大学 | RGBD camera-based unsupervised pose estimation network construction method |
CN111369608A (en) * | 2020-05-29 | 2020-07-03 | 南京晓庄学院 | Visual odometer method based on image depth estimation |
CN111739078A (en) * | 2020-06-15 | 2020-10-02 | 大连理工大学 | Monocular unsupervised depth estimation method based on context attention mechanism |
CN111783582A (en) * | 2020-06-22 | 2020-10-16 | 东南大学 | Unsupervised monocular depth estimation algorithm based on deep learning |
CN113160390A (en) * | 2021-04-28 | 2021-07-23 | 北京理工大学 | Three-dimensional dense reconstruction method and system |
CN113240722A (en) * | 2021-04-28 | 2021-08-10 | 浙江大学 | Self-supervision depth estimation method based on multi-frame attention |
CN113570658A (en) * | 2021-06-10 | 2021-10-29 | 西安电子科技大学 | Monocular video depth estimation method based on depth convolutional network |
CN113313732A (en) * | 2021-06-25 | 2021-08-27 | 南京航空航天大学 | Forward-looking scene depth estimation method based on self-supervision learning |
CN113450410A (en) * | 2021-06-29 | 2021-09-28 | 浙江大学 | Monocular depth and pose joint estimation method based on epipolar geometry |
CN114022799A (en) * | 2021-09-23 | 2022-02-08 | 中国人民解放军军事科学院国防科技创新研究院 | Self-supervision monocular depth estimation method and device |
CN114170286A (en) * | 2021-11-04 | 2022-03-11 | 西安理工大学 | Monocular depth estimation method based on unsupervised depth learning |
Non-Patent Citations (5)
Title |
---|
Unsupervised learning of depth and ego-motion from video;T.Zhou 等;《2017 IEEE Conference on Computer Vision and Pattern Recognition(CPVR)》;1851-1858 * |
基于域适应的图像深度信息估计方法研究;詹雁;《中国优秀硕士学位论文全文数据库信息科技辑》(第2021(04)期);I138-811 * |
基于无监督学习的单目图像深度估计;胡智程;《中国优秀硕士学位论文全文数据库信息科技辑》(第2021(08)期);I138-615 * |
基于语义先验和深度约束的室内动态场景RGB-D SLAM算法;姜昊辰 等;《信息与控制》;第50卷(第2021(03)期);275-286 * |
结合注意力与无监督深度学习的单目深度估计;岑仕杰 等;《广东工业大学学报》(第04期);35-41 * |
Also Published As
Publication number | Publication date |
---|---|
CN114998411A (en) | 2022-09-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mitrokhin et al. | EV-IMO: Motion segmentation dataset and learning pipeline for event cameras | |
WO2020046066A1 (en) | Method for training convolutional neural network to reconstruct an image and system for depth map generation from an image | |
Yang et al. | Fusion of median and bilateral filtering for range image upsampling | |
CN109815847B (en) | Visual SLAM method based on semantic constraint | |
US20140147031A1 (en) | Disparity Estimation for Misaligned Stereo Image Pairs | |
CN113850900B (en) | Method and system for recovering depth map based on image and geometric clues in three-dimensional reconstruction | |
Goncalves et al. | Deepdive: An end-to-end dehazing method using deep learning | |
CN114998411B (en) | Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss | |
Jeon et al. | Struct-MDC: Mesh-refined unsupervised depth completion leveraging structural regularities from visual SLAM | |
Yang et al. | SAM-Net: Semantic probabilistic and attention mechanisms of dynamic objects for self-supervised depth and camera pose estimation in visual odometry applications | |
CN113920270B (en) | Layout reconstruction method and system based on multi-view panorama | |
Tian et al. | Monocular depth estimation based on a single image: a literature review | |
Lu et al. | Stereo disparity optimization with depth change constraint based on a continuous video | |
CN117876452A (en) | Self-supervision depth estimation method and system based on moving object pose estimation | |
Zhang et al. | Depth map prediction from a single image with generative adversarial nets | |
CN112308893A (en) | Monocular depth estimation method based on iterative search strategy | |
CN112308917A (en) | Vision-based mobile robot positioning method | |
Bhutani et al. | Unsupervised Depth and Confidence Prediction from Monocular Images using Bayesian Inference | |
CN113160247B (en) | Anti-noise twin network target tracking method based on frequency separation | |
CN115239559A (en) | Depth map super-resolution method and system for fusion view synthesis | |
Liu et al. | Binocular depth estimation using convolutional neural network with Siamese branches | |
Yuan et al. | SceneFactory: A Workflow-centric and Unified Framework for Incremental Scene Modeling | |
Chowdhury et al. | An efficient algorithm for stereo correspondence matching | |
Hu et al. | Self-supervised monocular visual odometry based on cross-correlation | |
Fan et al. | Deeper into Self-Supervised Monocular Indoor Depth Estimation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |