CN114998411A - Self-supervision monocular depth estimation method and device combined with space-time enhanced luminosity loss - Google Patents
Self-supervision monocular depth estimation method and device combined with space-time enhanced luminosity loss Download PDFInfo
- Publication number
- CN114998411A CN114998411A CN202210475411.0A CN202210475411A CN114998411A CN 114998411 A CN114998411 A CN 114998411A CN 202210475411 A CN202210475411 A CN 202210475411A CN 114998411 A CN114998411 A CN 114998411A
- Authority
- CN
- China
- Prior art keywords
- luminosity
- reconstruction
- depth
- max
- loss
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 230000009466 transformation Effects 0.000 claims abstract description 45
- 230000008447 perception Effects 0.000 claims abstract description 16
- 238000013135 deep learning Methods 0.000 claims abstract description 14
- 238000004364 calculation method Methods 0.000 claims abstract description 11
- 239000011159 matrix material Substances 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 11
- 238000013519 translation Methods 0.000 claims description 10
- 239000013598 vector Substances 0.000 claims description 10
- 238000010276 construction Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 description 5
- 238000013507 mapping Methods 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to an auto-supervision monocular depth estimation method and device combined with space-time enhanced luminosity loss, wherein the method comprises the following steps: acquiring a plurality of adjacent frame images in an image sequence; and inputting the image into a trained deep learning network to obtain depth information and pose information, wherein luminosity loss information of the deep learning network is obtained based on a spatial transformation model of depth perception pixel corresponding relation, and pixels of a moving object are prevented from participating in luminosity error calculation by utilizing an omnidirectional automatic mask. The method can improve the accuracy of luminosity loss, and further monitor the learning of the deep network better.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a self-supervision monocular depth estimation method and device combined with space-time enhancement luminosity loss.
Background
Estimating depth information of a scene from an image, i.e., image depth estimation, is a fundamental and important task in computer vision today. The good image depth estimation algorithm can be applied to the fields of outdoor driving scenes, indoor small robots and the like, and has great application value. During the working process of the robot or the automatic driving automobile, scene depth information is obtained by using a depth estimation algorithm to assist the robot to carry out path planning or obstacle avoidance of the next movement.
Depth estimation using images is divided into supervised and unsupervised approaches. The supervised method mainly utilizes a neural network to establish mapping between an image and a depth map, and training is carried out under the supervision of a true value, so that the network gradually has the capability of fitting the depth. However, the self-supervision method is becoming mainstream in recent years because the truth of the supervision method is expensive. Compared with a method requiring binocular image training, the method based on sequence images has become a method widely concerned by researchers due to a wider application range.
The self-supervision monocular depth frame based on the sequence image mainly comprises a depth estimation network and a pose estimation network which respectively predict the depth of a target frame and the pose transformation of the target frame and a source frame. And combining the estimated depth and pose, transforming the source frame to a coordinate system of the target frame to obtain a reconstructed image, and supervising the simultaneous training of the two networks by utilizing the difference of luminosity of the target frame and the reconstructed image, namely luminosity loss. As the loss of luminosity decreases, the depth of the network estimate becomes increasingly accurate.
The method is characterized in that a space transformation model is required to be adopted when luminosity loss is generated, although the existing space transformation model conforms to a method of rigid body transformation theoretically, certain depth estimation errors can be brought by errors of translation vectors in poses in the calculation process, namely, the depth is larger, and the error of depth estimation is larger. In addition, in order to solve the problem of inaccurate luminosity loss caused by moving pixels which violate luminosity consistency in an image, the main idea of the existing mode is to find a binarization mask which is generated by filtering out pixels with unchanged luminosity from one frame to another frame in a training process, but the binarization mask can only distinguish an object with the same movement direction as a camera.
Disclosure of Invention
The inventors of the present invention found that the reason why the larger the depth, the larger the error of the depth estimation is as follows: the purpose of the spatial transformation is to spatially transform the corresponding pixels in the target frame and the source frame to coincide on the pixel plane, provided that a near point P is used N To solve for the corresponding pixel p t And p s The corresponding relationship of (a) is shown in FIG. 1. The principle of the self-supervised depth estimation is by minimizing p t And p s To make the estimated pose and depth more accurate. For near regions, as shown in FIG. 1, in the case of a certain number of points, only if p t And the transformed point p F When the pose is relatively coincident, the estimated pose can be more accurate, and the depth performance is better. For distant regions, as shown in FIG. 2, p can be guaranteed only by the accuracy of the predicted rotation matrix t And p s The photometric error becomes small, so if the photometric error is constructed by using the estimated rotation matrix and translation vector without distinguishing the distance, the photometric error uncertainty is greatly increased, thereby causing the result of depth estimation to be poor.
The invention aims to solve the technical problem of providing a self-supervision monocular depth estimation method and a device combining with space-time enhanced luminosity loss, which can improve the accuracy of luminosity loss and further better supervise the learning of a depth network.
The technical scheme adopted by the invention for solving the technical problems is as follows: the method for estimating the depth of the self-supervision monocular combined with the space-time enhanced luminosity loss comprises the following steps:
acquiring a plurality of adjacent frame images in an image sequence;
and inputting the image into a trained deep learning network to obtain depth information and pose information, wherein luminosity loss information of the deep learning network is obtained based on a spatial transformation model of depth perception pixel corresponding relation, and pixels of a moving object are prevented from participating in luminosity error calculation by utilizing an omnidirectional automatic mask.
The luminosity loss information is obtained based on a spatial transformation model corresponding to the depth perception pixels, and specifically comprises the following steps:
carrying out spatial transformation on the far region by using a homography matrix, and constructing a first reconstruction map; wherein the far zone treats the far zone as a plane of infinity;
performing space transformation by using the basic matrix, and constructing a second reconstructed image;
and solving a luminosity error map based on the first reconstruction map and a luminosity error map based on the second reconstruction map through the corresponding relation of two pixels, and then selecting the minimum value pixel by pixel to obtain the final luminosity loss information.
The calculation for avoiding the pixel of the moving object from participating in the luminosity error by utilizing the omnidirectional automatic mask specifically comprises the following steps:
predicting the initial depth and the initial pose of a target frame through a pre-training network, and generating an initial reconstruction map;
adding interference items to the initial pose, and obtaining a plurality of assumed reconstruction frames by utilizing space transformation; generating a plurality of luminosity error maps by using the assumed reconstruction frame and combining the luminosity of the target frame, and obtaining a plurality of binarization masks by using the luminosity error maps;
and selecting the minimum value from the plurality of binarization masks as a final mask.
The disturbance term is a translational disturbance term, and comprises the following steps: [ t ] of max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0,0, -t max ]Wherein, t max Representing the maximum value in the initialized translation vector.
The technical scheme adopted by the invention for solving the technical problem is as follows: there is provided an apparatus for self-supervised monocular depth estimation in combination with spatio-temporal enhancement of photometric loss, comprising:
the acquisition module is used for acquiring a plurality of adjacent frame images in the image sequence;
the estimation module is used for inputting the image into a trained deep learning network to obtain depth information and pose information; luminosity loss information of the deep learning network is obtained based on a space transformation model of the depth perception pixel corresponding relation module, and pixels of a moving object are prevented from participating in luminosity error calculation by utilizing the omnidirectional automatic mask module.
The depth perception pixel correspondence module comprises:
the first construction unit is used for carrying out spatial transformation on the far region by using the homography matrix and constructing a first reconstruction map; wherein the far zone treats the far zone as a plane of infinity;
a second construction unit for performing spatial transformation using the basis matrix and constructing a second reconstruction map;
and the luminosity loss information acquisition unit is used for solving a luminosity error map based on the first reconstruction map and a luminosity error map based on the second reconstruction map through the corresponding relation of two pixels, and then selecting the minimum value pixel by pixel to obtain the final luminosity loss information.
The omnidirectional automatic mask module includes:
the initial reconstruction image generating unit is used for predicting the initial depth and the initial pose of the target frame through a pre-training network and generating an initial reconstruction image;
a binarization mask generating unit, which is used for adding interference items to the initial pose and obtaining a plurality of assumed reconstruction frames by utilizing space transformation; generating a plurality of luminosity error maps by using the assumed reconstruction frame and combining the luminosity of the target frame, and obtaining a plurality of binarization masks by using the luminosity error maps;
and the mask selecting unit is used for selecting the minimum value from the plurality of binary masks as a final mask.
The disturbance term is a translational disturbance term, and comprises the following steps: [ t ] of max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0,0, -t max ]Wherein, t max Representing the maximum value in the initialized translation vector.
Advantageous effects
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the invention adopts a depth perception pixel corresponding mode to excavate the pixel corresponding relation of the far area, improves the problem of inaccurate pixel corresponding of the far area, and obtains an omnidirectional binary mask by utilizing an omnidirectional automatic mask mode to avoid the pixel of a moving object from participating in the calculation of luminosity error. The invention improves the accuracy of luminosity loss by improving space transformation and generating the dynamic object automatic mask, thereby better supervising the learning of the deep network.
Drawings
FIG. 1 is a schematic diagram of a near point pose solution;
FIG. 2 is a schematic diagram of a remote point pose solution;
FIG. 3 is a schematic representation of the Monodepth2 basic framework;
FIG. 4 is a schematic diagram of the generation of light loss in the first embodiment of the present invention;
fig. 5 is a schematic view of an omnidirectional automatic mask according to a first embodiment of the present invention.
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
The first embodiment of the invention relates to an automatic supervision monocular depth estimation method combined with space-time enhanced luminosity loss, which comprises the following steps: acquiring a plurality of adjacent frame images in an image sequence; and inputting the image into a trained deep learning network to obtain depth information and pose information, wherein luminosity loss information of the deep learning network is obtained based on a spatial transformation model corresponding to the depth perception pixels, and pixels of a moving object are prevented from participating in luminosity error calculation by utilizing an omnidirectional automatic mask.
The method of the embodiment can be directly used in general self-supervision monocular depth estimation, and any work which takes the framework of SfMLearner as the implementation principle can use the method of the embodiment. The method only needs to adopt the spatial transformation model based on depth perception pixel correspondence in the embodiment to the spatial transformation part in the original frame, and adopts the omnidirectional automatic mask in the application to the automatic mask part.
The invention is further illustrated below by way of example in the basic framework of Monodepth2 of Godardetal.
For easier understanding, the overall framework of Monodepth2 is described first, as shown in FIG. 3, which is input as three adjacent frames of RGB images in a sequence; and outputting the target frame depth and the pose transformation between the target frame and the source frame.
The basic framework of this embodiment is the same as that of fig. 3. Since the improvement of the embodiment is mainly the method of generating the luminosity loss and the automatic mask part by spatial transformation, the two parts of the Monodepth2 are described first:
monodepth2 uses the same space transformation model as SfMLearner to obtain the target frame I t Depth D of t And a target frame I t And source frame I s Position and posture T t→s =[R t→s |t t→s ]. For corresponding pixels p between the target frame and the source frame t And p s If the two points correspond to the same 3D point, the following conditions should be satisfied:
D s K -1 p s =D t K -1 p t
wherein K is an internal reference of the camera. Since monocular depth has scale ambiguity, the transformation can be used for spatial transformation as follows:
p s ~KT t→s D t K -1 p t
in space geometric transformation KT t→s K -1 Defined as a basic matrix F for the correspondence of the pixels between frames. The relationship can then be used to construct a reconstructed frame
From the target frame and the reconstructed frame, a luminosity loss pe can be constructed, which consists of an L1 error and a Structural Similarity (SSIM) error, as follows:
α is a hyper-parameter, Monodepth2 is set to 0.85
The automatic masking of Monodepth2 is mainly to solve the inaccurate luminosity loss caused by moving pixels in the image that violate the luminosity consistency. The main idea is to find out the pixels with unchanged luminosity which are filtered from one frame to another frame in the training process, and the generated binarization mask mu is as follows:
[]is Iversonbrack, used to generate a binary mask. I is t Is a target frame, I s Is a frame of a source of the video,is the reconstructed frame resulting from the spatial transformation.
For the luminosity loss generated by spatial transformation, the luminosity loss is obtained based on the spatial transformation model corresponding to the depth perception pixel in the embodiment. As shown in fig. 4, the details are as follows:
during the spatial transformation, a sufficiently distant region can be regarded as a plane of infinity, and the plane satisfies:
n T P+D=0
wherein n is a normal vector of the plane, P is a three-dimensional point on the plane, D is a depth of the point, and the n is obtained by transformation:
bringing it into a spatially transformed relationship yields:
when D is t At infinity, i.e. for the infinity plane:
p s ~KR t→s D t K -1 p t
KR t→s K -1 defined as the homography matrix H ∞ at infinity, the reconstruction map is constructed by spatially transforming only the rotation matrix for the distant regionFor the purpose of differentiation, the reconstruction map obtained using the basis matrix is represented asSince the depth estimated by monocular scale estimation has scale ambiguity, the correspondence between two kinds of pixels cannot be selected directly through the predicted depth. Therefore, the embodiment designs a method for adaptive selection, specifically, two luminosity error maps are solved through a corresponding relationship between two pixels, and then a minimum value is selected pixel by pixel, that is, the final luminosity error is:
for the omnidirectional automatic mask, the embodiment directly inputs the image sequence into the module, and after obtaining the mask result, the mask result is applied to the photometric error to shield the unreliable part, as shown in fig. 5, specifically as follows:
the embodiment introduces a Monodepth2 pre-training network to predict the initial of the target frameDepth D init And initial frame pose T init Further generating an initial reconstruction map I init . Because the depth and the pose are accurate, the luminosity error of the region which accords with the luminosity consistency is small, but the potential of the region which does not accord with the luminosity consistency is small.
According to the method, interference items are added to the initial pose, a plurality of interfered poses are introduced, and a plurality of assumed reconstruction frames are obtained after space transformation is utilized. Using these reconstructed frames I i Wherein, i ∈ {1,2, … }, in combination with the luminance of the target frame, a plurality of luminance error maps can be generated, and a plurality of binary masks can be obtained by using the magnitudes of the luminance error values, corresponding to the pixels of the moving object in each direction, as follows:
M i =[pe(I t ,I init ),pe(I t ,I i )]
in order to capture the object moving in each direction, the generated masks are minimized to obtain the final mask, namely:
M oA =min(M 1 ,M 2 ,…)
in the implementation process of the embodiment, only the translation vector is disturbed, and the specific translation disturbance item t i : t 1 =[t max ,0,0]、t 2 =[-t max ,0,0]、t 3 =[0,0,t max ]And t 4 =[0,0,-t max ]Wherein, t max Is the maximum value in the initialized translation vector.
It is not difficult to find that the invention adopts the depth perception pixel corresponding mode to excavate the pixel corresponding relation of the far area, improves the inaccurate pixel corresponding problem of the far area, and obtains an omnidirectional binary mask by utilizing the omnidirectional automatic mask mode to avoid the pixel of the moving object from participating in the calculation of the luminosity error. The invention improves the accuracy of luminosity loss by improving space transformation and generating the dynamic object automatic mask, thereby better supervising the learning of the deep network. Therefore, by applying the depth perception pixel mapping and omnidirectional automatic mask of the present embodiment to the framework of Monodepth2 in Godard et al, a monocular depth estimation result with higher accuracy can be obtained.
A second embodiment of the present invention relates to an apparatus for self-supervised monocular depth estimation in combination with spatiotemporal enhancement of photometric loss, comprising: the acquisition module is used for acquiring a plurality of adjacent frame images in the image sequence; the estimation module is used for inputting the image into a trained deep learning network to obtain depth information and pose information; luminosity loss information of the deep learning network is obtained based on a space transformation model of the depth perception pixel corresponding relation module, and pixels of a moving object are prevented from participating in luminosity error calculation by utilizing the omnidirectional automatic mask module.
The depth perception pixel correspondence module comprises: the first construction unit is used for carrying out spatial transformation on the far region by using the homography matrix and constructing a first reconstruction map; wherein the far zone treats the far zone as a plane of infinity; a second construction unit for performing spatial transformation using the basis matrix and constructing a second reconstruction map; and the luminosity loss information acquisition unit is used for solving a luminosity error map based on the first reconstruction map and a luminosity error map based on the second reconstruction map through the corresponding relation of two pixels, and then selecting the minimum value pixel by pixel to obtain the final luminosity loss information.
The omnidirectional automatic mask module includes: the initial reconstruction image generating unit is used for predicting the initial depth and the initial pose of the target frame through a pre-training network and generating an initial reconstruction image; a binarization mask generating unit, which is used for adding interference items to the initial pose and obtaining a plurality of assumed reconstruction frames by utilizing space transformation; generating a plurality of luminosity error maps by using the assumed reconstruction frame and combining the luminosity of the target frame, and obtaining a plurality of binarization masks by using the luminosity error maps; and the mask selecting unit is used for selecting the minimum value from the plurality of binary masks as a final mask. Wherein the interference term is a translational disturbance term, including: [ t ] of max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0,0, -t max ]Wherein, t max In translation vectors representing initialisationA maximum value.
Claims (8)
1. An auto-supervised monocular depth estimation method combined with spatio-temporal enhancement luminosity loss is characterized by comprising the following steps of:
acquiring a plurality of adjacent frame images in an image sequence;
and inputting the image into a trained deep learning network to obtain depth information and pose information, wherein luminosity loss information of the deep learning network is obtained based on a spatial transformation model of depth perception pixel correspondence, and pixels of a moving object are prevented from participating in calculation of luminosity errors by utilizing an omnidirectional automatic mask.
2. The method for self-supervised monocular depth estimation with spatio-temporal enhancement photometric loss according to claim 1, wherein the photometric loss information is derived based on a spatial transformation model of depth perception pixel correspondences specifically as:
carrying out spatial transformation on the far region by using a homography matrix, and constructing a first reconstruction diagram; wherein the far zone treats the far zone as a plane of infinity;
performing space transformation by using the basic matrix, and constructing a second reconstructed image;
and solving a luminosity error map based on the first reconstruction map and a luminosity error map based on the second reconstruction map through the corresponding relation of two pixels, and then selecting the minimum value pixel by pixel to obtain the final luminosity loss information.
3. The method for self-supervised monocular depth estimation with spatio-temporal enhancement luminosity loss according to claim 1, wherein the computation for avoiding the pixel participation luminosity error of the moving object by using the omnidirectional automatic mask is specifically as follows:
predicting the initial depth and the initial pose of a target frame through a pre-training network, and generating an initial reconstruction map;
adding interference items to the initial pose, and obtaining a plurality of assumed reconstruction frames by utilizing space transformation; generating a plurality of luminosity error maps by using the assumed reconstruction frame and combining the luminosity of the target frame, and obtaining a plurality of binarization masks by using the luminosity error maps;
and selecting the minimum value from the plurality of binarization masks as a final mask.
4. The method of claim 3, wherein the interference term is a translational perturbation term, and comprises: [ t ] of max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0,0, -t max ]Wherein, t max Representing the maximum value in the initialized translation vector.
5. An apparatus for self-supervised monocular depth estimation in conjunction with spatio-temporal enhancement of photometric loss, comprising:
the acquisition module is used for acquiring a plurality of adjacent frame images in the image sequence;
the estimation module is used for inputting the image into a trained deep learning network to obtain depth information and pose information;
luminosity loss information of the deep learning network is obtained based on a space transformation model of the depth perception pixel corresponding relation module, and pixels of a moving object are prevented from participating in luminosity error calculation by utilizing the omnidirectional automatic mask module.
6. The apparatus of claim 5, wherein the depth-aware pixel correspondence module comprises:
the first construction unit is used for carrying out spatial transformation on the far region by using the homography matrix and constructing a first reconstruction map;
wherein the distance area regards the distance area as a plane at infinity;
a second construction unit for performing spatial transformation using the basis matrix and constructing a second reconstruction map;
and the luminosity loss information acquisition unit is used for solving a luminosity error map based on the first reconstruction map and a luminosity error map based on the second reconstruction map through the corresponding relation of two pixels, and then selecting the minimum value pixel by pixel to obtain the final luminosity loss information.
7. The apparatus for self-supervised monocular depth estimation with spatio-temporal enhancement of photometric loss of claim 5, characterized in that the omnidirectional auto-mask module comprises:
the initial reconstruction image generation unit is used for predicting the initial depth and the initial pose of the target frame through a pre-training network and generating an initial reconstruction image;
a binarization mask generating unit, which is used for adding interference items to the initial pose and obtaining a plurality of assumed reconstruction frames by utilizing space transformation; generating a plurality of luminosity error maps by combining the luminosity of the target frame by using the assumed reconstruction frame, and obtaining a plurality of binarization masks by using the luminosity error maps
And the mask selecting unit is used for selecting the minimum value from the plurality of binary masks as a final mask.
8. The apparatus of claim 7, wherein the interference term is a translational perturbation term, comprising: [ t ] of max ,0,0]、[-t max ,0,0]、[0,0,t max ]And [0,0, -t max ]Wherein, t max Representing the maximum value in the initialized translation vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210475411.0A CN114998411B (en) | 2022-04-29 | 2022-04-29 | Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210475411.0A CN114998411B (en) | 2022-04-29 | 2022-04-29 | Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114998411A true CN114998411A (en) | 2022-09-02 |
CN114998411B CN114998411B (en) | 2024-01-09 |
Family
ID=83025390
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210475411.0A Active CN114998411B (en) | 2022-04-29 | 2022-04-29 | Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114998411B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116245927A (en) * | 2023-02-09 | 2023-06-09 | 湖北工业大学 | ConvDepth-based self-supervision monocular depth estimation method and system |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110264509A (en) * | 2018-04-27 | 2019-09-20 | 腾讯科技(深圳)有限公司 | Determine the method, apparatus and its storage medium of the pose of image-capturing apparatus |
US20190356905A1 (en) * | 2018-05-17 | 2019-11-21 | Niantic, Inc. | Self-supervised training of a depth estimation system |
CN111260680A (en) * | 2020-01-13 | 2020-06-09 | 杭州电子科技大学 | RGBD camera-based unsupervised pose estimation network construction method |
US20200211206A1 (en) * | 2018-12-27 | 2020-07-02 | Baidu Usa Llc | Joint learning of geometry and motion with three-dimensional holistic understanding |
CN111369608A (en) * | 2020-05-29 | 2020-07-03 | 南京晓庄学院 | Visual odometer method based on image depth estimation |
CN111739078A (en) * | 2020-06-15 | 2020-10-02 | 大连理工大学 | Monocular unsupervised depth estimation method based on context attention mechanism |
CN111783582A (en) * | 2020-06-22 | 2020-10-16 | 东南大学 | Unsupervised monocular depth estimation algorithm based on deep learning |
US20210118184A1 (en) * | 2019-10-17 | 2021-04-22 | Toyota Research Institute, Inc. | Systems and methods for self-supervised scale-aware training of a model for monocular depth estimation |
CN113160390A (en) * | 2021-04-28 | 2021-07-23 | 北京理工大学 | Three-dimensional dense reconstruction method and system |
CN113240722A (en) * | 2021-04-28 | 2021-08-10 | 浙江大学 | Self-supervision depth estimation method based on multi-frame attention |
CN113313732A (en) * | 2021-06-25 | 2021-08-27 | 南京航空航天大学 | Forward-looking scene depth estimation method based on self-supervision learning |
CN113450410A (en) * | 2021-06-29 | 2021-09-28 | 浙江大学 | Monocular depth and pose joint estimation method based on epipolar geometry |
CN113570658A (en) * | 2021-06-10 | 2021-10-29 | 西安电子科技大学 | Monocular video depth estimation method based on depth convolutional network |
US20210398301A1 (en) * | 2020-06-17 | 2021-12-23 | Toyota Research Institute, Inc. | Camera agnostic depth network |
CN114022799A (en) * | 2021-09-23 | 2022-02-08 | 中国人民解放军军事科学院国防科技创新研究院 | Self-supervision monocular depth estimation method and device |
CN114170286A (en) * | 2021-11-04 | 2022-03-11 | 西安理工大学 | Monocular depth estimation method based on unsupervised depth learning |
-
2022
- 2022-04-29 CN CN202210475411.0A patent/CN114998411B/en active Active
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110264509A (en) * | 2018-04-27 | 2019-09-20 | 腾讯科技(深圳)有限公司 | Determine the method, apparatus and its storage medium of the pose of image-capturing apparatus |
US20190356905A1 (en) * | 2018-05-17 | 2019-11-21 | Niantic, Inc. | Self-supervised training of a depth estimation system |
US20200211206A1 (en) * | 2018-12-27 | 2020-07-02 | Baidu Usa Llc | Joint learning of geometry and motion with three-dimensional holistic understanding |
US20210118184A1 (en) * | 2019-10-17 | 2021-04-22 | Toyota Research Institute, Inc. | Systems and methods for self-supervised scale-aware training of a model for monocular depth estimation |
CN111260680A (en) * | 2020-01-13 | 2020-06-09 | 杭州电子科技大学 | RGBD camera-based unsupervised pose estimation network construction method |
CN111369608A (en) * | 2020-05-29 | 2020-07-03 | 南京晓庄学院 | Visual odometer method based on image depth estimation |
US20210390723A1 (en) * | 2020-06-15 | 2021-12-16 | Dalian University Of Technology | Monocular unsupervised depth estimation method based on contextual attention mechanism |
CN111739078A (en) * | 2020-06-15 | 2020-10-02 | 大连理工大学 | Monocular unsupervised depth estimation method based on context attention mechanism |
US20210398301A1 (en) * | 2020-06-17 | 2021-12-23 | Toyota Research Institute, Inc. | Camera agnostic depth network |
CN111783582A (en) * | 2020-06-22 | 2020-10-16 | 东南大学 | Unsupervised monocular depth estimation algorithm based on deep learning |
CN113160390A (en) * | 2021-04-28 | 2021-07-23 | 北京理工大学 | Three-dimensional dense reconstruction method and system |
CN113240722A (en) * | 2021-04-28 | 2021-08-10 | 浙江大学 | Self-supervision depth estimation method based on multi-frame attention |
CN113570658A (en) * | 2021-06-10 | 2021-10-29 | 西安电子科技大学 | Monocular video depth estimation method based on depth convolutional network |
CN113313732A (en) * | 2021-06-25 | 2021-08-27 | 南京航空航天大学 | Forward-looking scene depth estimation method based on self-supervision learning |
CN113450410A (en) * | 2021-06-29 | 2021-09-28 | 浙江大学 | Monocular depth and pose joint estimation method based on epipolar geometry |
CN114022799A (en) * | 2021-09-23 | 2022-02-08 | 中国人民解放军军事科学院国防科技创新研究院 | Self-supervision monocular depth estimation method and device |
CN114170286A (en) * | 2021-11-04 | 2022-03-11 | 西安理工大学 | Monocular depth estimation method based on unsupervised depth learning |
Non-Patent Citations (5)
Title |
---|
T.ZHOU 等: "Unsupervised learning of depth and ego-motion from video", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION(CPVR)》, pages 1851 - 1858 * |
姜昊辰 等: "基于语义先验和深度约束的室内动态场景RGB-D SLAM算法", 《信息与控制》, vol. 50, no. 2021, pages 275 - 286 * |
岑仕杰 等: "结合注意力与无监督深度学习的单目深度估计", 《广东工业大学学报》, no. 04, pages 35 - 41 * |
胡智程: "基于无监督学习的单目图像深度估计", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 2021, pages 138 - 615 * |
詹雁: "基于域适应的图像深度信息估计方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 2021, pages 138 - 811 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116245927A (en) * | 2023-02-09 | 2023-06-09 | 湖北工业大学 | ConvDepth-based self-supervision monocular depth estimation method and system |
CN116245927B (en) * | 2023-02-09 | 2024-01-16 | 湖北工业大学 | ConvDepth-based self-supervision monocular depth estimation method and system |
Also Published As
Publication number | Publication date |
---|---|
CN114998411B (en) | 2024-01-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111325794B (en) | Visual simultaneous localization and map construction method based on depth convolution self-encoder | |
WO2020046066A1 (en) | Method for training convolutional neural network to reconstruct an image and system for depth map generation from an image | |
CN112505065B (en) | Method for detecting surface defects of large part by indoor unmanned aerial vehicle | |
Zuo et al. | Devo: Depth-event camera visual odometry in challenging conditions | |
CN111783582A (en) | Unsupervised monocular depth estimation algorithm based on deep learning | |
US8867826B2 (en) | Disparity estimation for misaligned stereo image pairs | |
CN111462210A (en) | Monocular line feature map construction method based on epipolar constraint | |
CN109815847A (en) | A kind of vision SLAM method based on semantic constraint | |
Zhong et al. | WF-SLAM: A robust VSLAM for dynamic scenarios via weighted features | |
Shreyas et al. | 3D object detection and tracking methods using deep learning for computer vision applications | |
CN112686952A (en) | Image optical flow computing system, method and application | |
CN114494150A (en) | Design method of monocular vision odometer based on semi-direct method | |
CN114998411A (en) | Self-supervision monocular depth estimation method and device combined with space-time enhanced luminosity loss | |
Yang et al. | SAM-Net: Semantic probabilistic and attention mechanisms of dynamic objects for self-supervised depth and camera pose estimation in visual odometry applications | |
CN111598927A (en) | Positioning reconstruction method and device | |
CN112308893B (en) | Monocular depth estimation method based on iterative search strategy | |
Zhang et al. | Depth map prediction from a single image with generative adversarial nets | |
Bhutani et al. | Unsupervised Depth and Confidence Prediction from Monocular Images using Bayesian Inference | |
CN117079072A (en) | Method for constructing visual odometer based on image robustness of deep learning | |
CN114935316B (en) | Standard depth image generation method based on optical tracking and monocular vision | |
Kitt et al. | Trinocular optical flow estimation for intelligent vehicle applications | |
Taguchi et al. | Unsupervised Simultaneous Learning for Camera Re-Localization and Depth Estimation from Video | |
Liu et al. | Binocular depth estimation using convolutional neural network with Siamese branches | |
CN114782540B (en) | Three-dimensional reconstruction camera attitude estimation method considering mechanical arm motion constraint | |
CN115880334B (en) | Video object tracking method with automatic machine learning map fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |