CN111105432A - Unsupervised end-to-end driving environment perception method based on deep learning - Google Patents
Unsupervised end-to-end driving environment perception method based on deep learning Download PDFInfo
- Publication number
- CN111105432A CN111105432A CN201911345900.9A CN201911345900A CN111105432A CN 111105432 A CN111105432 A CN 111105432A CN 201911345900 A CN201911345900 A CN 201911345900A CN 111105432 A CN111105432 A CN 111105432A
- Authority
- CN
- China
- Prior art keywords
- estimation network
- pose
- depth
- flow
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/215—Motion-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/269—Analysis of motion using gradient-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses an unsupervised end-to-end driving environment perception method based on deep learning, which comprises the following steps: acquiring images by using a binocular camera, and preprocessing to obtain training data; utilizing two continuous stereo images with the same size in training data to train an optical flow estimation network, a pose estimation network, a depth estimation network and motion segmentation; carrying out rigid registration by using output results of the three networks to optimize the output of the pose estimation network; and calculating rigid flow caused by the motion of the camera by using the output of the depth estimation network and the output of the optimized pose estimation network, and performing flow consistency check with the output of the optical flow estimation network so as to perform motion segmentation. The method adopts an unsupervised end-to-end frame without requiring true value depth, pose and optical flow as label supervision training, and can obtain camera pose with absolute scale and dense depth map estimation, thereby segmenting dynamic objects with higher precision.
Description
Technical Field
The invention relates to the technical field of intelligent driving, in particular to an unsupervised end-to-end driving environment perception method based on deep learning.
Background
Learning three-dimensional scene geometry, scene flow, and robot motion relative to rigid scenes from video images is an important research content in computer vision and has found widespread application in many different fields, including autopilot, robot navigation, and video analysis, among others. However, the current environmental perception methods based on deep learning are all supervised learning frameworks, and it is very difficult to obtain the true value labels for training. In recent years, many advances have been made in unsupervised learning of depth, optical flow, and pose using convolutional neural network methods. These methods have their own advantages and limitations. Unsupervised deep learning approaches take advantage of the geometry of the scene and decompose the problem into multiple orthogonal problems, adding more constraints to the solution with more temporal image frames or stereo image information. On the one hand, current optical flow, depth and pose estimation methods based on depth learning assume that the entire scene is static, and therefore it is difficult to handle moving objects. On the other hand, the optical flow method can handle moving objects in principle, but has difficulty in a complicated structure region and an occlusion region.
Chinese patent ' method for estimating and optimizing depth of monocular view in video sequence by using depth learning ' (publication number: CN108765479A) ' estimates and optimizes the depth of the monocular view in the video sequence by using depth learning, but the method based on monocular vision has scale uncertainty, so the estimated depth scale is unknown and has no practical application value.
Chinese patent "a binocular depth estimation method based on a depth convolution network" (publication number: CN109598754A) trains a deep convolution neural network to perform depth estimation by using binocular images, but a true value depth is required to participate in training as a label in the training process, but it is very difficult and expensive to obtain the true value depth in an actual environment.
Chinese patent "a monocular vision positioning method based on unsupervised learning" (publication number: CN109472830A) utilizes the method of unsupervised learning to carry out monocular vision positioning, but monocular vision positioning has scale uncertainty and scale drift, positioning accuracy is poor, and positioning scale uncertainty has no engineering value in actual environment.
Therefore, the current driving environment perception method based on deep learning still has the following problems:
1) the depth estimation and pose estimation depth learning model trained by using the monocular picture sequence is limited by monocular scale uncertainty and scale drift, the estimated depth and pose scale are unknown, and the model has no practical application value;
2) the current depth estimation, pose estimation and optical flow estimation methods based on deep learning need true value supervised training, but true value data acquisition in a real environment is very difficult and needs high cost;
3) dynamic objects are very common in the actual driving environment, the current environment perception method based on deep learning does not consider the influence of the dynamic objects, and the precision is to be further improved.
Disclosure of Invention
The invention aims to provide an unsupervised end-to-end driving environment perception method based on deep learning, an unsupervised end-to-end framework is adopted, true value depth, pose and optical flow are not needed to be used as label supervision training, and camera pose with absolute scale and dense depth map estimation can be obtained, so that a dynamic object can be segmented with high precision.
The purpose of the invention is realized by the following technical scheme:
an unsupervised end-to-end driving environment perception method based on deep learning comprises the following steps:
acquiring images by using a binocular camera, and preprocessing to obtain training data;
utilizing two continuous stereo images with the same size in training data to train an optical flow estimation network, a pose estimation network, a depth estimation network and motion segmentation;
after training is finished, carrying out rigid registration on two newly input continuous stereo image pairs with the same size by using output results of the three networks to optimize the output of a pose estimation network; and calculating rigid flow caused by the motion of the camera by using the output of the depth estimation network and the output of the optimized pose estimation network, and performing flow consistency check with the output of the optical flow estimation network so as to perform motion segmentation.
According to the technical scheme provided by the invention, the training data only need binocular RGB images, and the data acquisition is very simple; by adopting a unified framework, the light stream, the depth, the pose and the motion segmentation can be learned at the same time, the training process of the model is simple and direct, the parameters needing to be adjusted are very few, and the scene migration capability is strong; the model has good adaptability, can learn the optical flow and the geometric information of the environment with absolute scale depth, pose and the like in an unsupervised end-to-end mode, and can segment dynamic objects with higher precision due to higher precision of the estimated optical flow, pose and depth.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a flowchart of an unsupervised end-to-end driving environment sensing method based on deep learning according to an embodiment of the present invention;
fig. 2 is a framework diagram of an unsupervised end-to-end driving environment sensing method based on deep learning according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides an unsupervised end-to-end driving environment perception method based on deep learning, and as shown in fig. 1-2, a flow chart and a frame chart of the method are respectively provided. The method mainly comprises the following steps:
1. and acquiring images by using a binocular camera, and preprocessing to obtain training data.
In the embodiment of the invention, the binocular camera is applied to driving environment perception, so that the binocular camera is installed on a vehicle and is used for acquiring an environment image.
Before network training is input, in order to reduce training time and reduce calculation cost and hardware consumption, original images acquired by the binocular camera are zoomed, and corresponding camera parameters are zoomed simultaneously.
In addition, a data enhancement method is also applied to improve the generalization performance of the model and reduce overfitting, training data is generated through the method, and two continuous stereo images with the same size are extracted each time of training and input to the network for training. Two consecutive stereo image pairs of the same size are denoted L1、R1、L2And R2(ii) a Wherein L is1、R1Corresponding representation at t1Left and right images of time, L2、R2Corresponding representation t2Left and right images of time, width,The height is noted as W, H.
In the embodiment of the invention, the data enhancement method comprises the following steps of performing data enhancement in one or more modes:
randomly correcting the input monocular image by using a brightness factor y;
scale factor sxAnd syZooming the image along an X axis and a Y axis, and then randomly cutting the image into a specified size;
randomly rotating the image by r degrees, and interpolating by using a nearest neighbor method;
random left-right flipping and random time sequence switching (exchange t)1And t2)。
Illustratively, the following setting γ ∈ [0.7, 1.3 ] may be employed],sx∈[1.0,1.2],sy∈[1.0,1.2],r∈[-5,5](ii) a The specified size may be set as: 832 × 256.
2. And training an optical flow estimation network, a pose estimation network, a depth estimation network and motion segmentation by using two continuous stereo image pairs with the same size in the training data.
In this step, the training of the optical flow estimation network, the pose estimation network, the depth estimation network, and the motion segmentation by using two consecutive same-size stereo images in the training data is mainly divided into the following two stages:
the first stage is as follows: and training an optical flow estimation network by using continuous stereo images with the same size in the training data, and simultaneously training a pose estimation network and a depth estimation network.
In this phase, first, two successive left images L are utilized1And L2And designed optical flow loss functionTraining an optical flow estimation network, the output of which is two continuous left images L with the same size1And L2Flow of light betweenIts dimensions and input imageThe same is true.
The optical flow loss functionThe method comprises the following steps: occlusion aware reconstruction loss termAnd a smoothing loss term Is based on a weighted average between the loss of Structural Similarity (SSIM) and the loss of absolute photometric difference over an unclosed area,being the mean absolute value of the edge-weighted second derivative of optical flow over moving areas, will provide a constraint on optical flow over static areas in the consistency loss section.
Where ψ (.) represents an occlusion aware reconstruction loss function, α represents an adjustment coefficient, O1Representing non-occluded areas, M1Representing a loss mask, N being the normalized coefficient (i.e. the number of pixels of the moving area);is represented by L1、L2Flow of light betweenAnd in combination with L2Reconstructed left image and notee denotes the natural logarithm, (i, j) denotes the pixel position,refers to the derivation operation along the x or y direction of the image, the square of which represents the derivation of the second order, a refers to the x or y direction of the image, which indicates the direction of the derivation, and β is a weight, which is a constant value.
Then, simultaneously training a pose estimation network and a depth estimation network:
using two successive left images L1And L2And designed rigid flow loss functionTraining a pose estimation network, outputting the pose estimation network as two continuous left images L1And L2Relative camera pose T therebetween12(ii) a Using two successive pairs L of stereo images of the same size1、R1、L2And R2And loss of stereoTraining a depth estimation network, the output of which is the disparity d between stereo image pairs, using a stereo camera baseline B and a horizontal focal length fxCalculating the absolute scale depth D ═ Bf through the parallax DxD, recording the calculated absolute scale depth as D1,2。
wherein, O1Representing non-occluded areas, M1Representing a loss mask;according to rigid flowAnd in combination with L2Two reconstructed left images, notedRigid flowBy absolute scale depth D1,2And pose T12Calculated (assuming the entire scene is static), rigid flowBy absolute scale depth D1,2And the optimized pose T'12Is calculated to obtain (T'12See below for the calculation of (c).
Will be provided withIs involved in the loss, since the rigid registration module is not differentiable, it is necessary to do soTo supervise the training pose estimation network.
And a second stage: and simultaneously training an optical flow estimation network, a pose estimation network, a depth estimation network and motion segmentation by using continuous stereo image pairs with the same size in the training data.
At this stage, two consecutive stereo image pairs L with the same size are used1、R1、L2And R2Optical flow lossLoss of dimensionLoss of rigid flowAnd loss of flow consistencyAnd simultaneously training an optical flow estimation network, a pose estimation network, a depth estimation network, a rigid registration module and a flow consistency check module.
The optical flow estimation network, pose estimation network and depth estimation network are trained in the stage, the training process is the same as that in the first stage, the output result is the same, and the description is omitted. The difference is that the motion segmentation is trained simultaneously by combining the outputs of the three networks at this stage, and since the principles of this part are the same in the test stage and the training stage, the description will be given later to avoid redundancy. Based on the training strategy, the problem of gradient disappearance generated in the training process of the network can be avoided.
Alternatively, the optical flow estimation network may employ a PWC-Net framework that merges several classical optical flow estimation techniques in an end-to-end trainable deep neural network, including image pyramids, warping, and cost metrics, to achieve the most advanced results. The pose estimation network can adopt a framework based on a cyclic convolution neural network (RCNN), and the features extracted by the CNN are input into two layers of convolution LSTM (ConvLSTM) to output 6-DoF poses, and the poses are translated by p ═ t [ (t [) ]x,ty,tz) And angle of rotationAnd (4) forming. The depth estimation network can employ an encoder and decoder architecture based on ResNet50, and the network can estimateA dense depth map of the same size as the input raw RGB image is computed.
3. After training is finished, carrying out rigid registration on two newly input continuous stereo image pairs with the same size by using output results of the three networks to optimize the output of a pose estimation network; and calculating rigid flow caused by the motion of the camera by using the output of the depth estimation network and the output of the optimized pose estimation network, and performing flow consistency check with the output of the optical flow estimation network so as to perform motion segmentation.
1) A rigid registration module.
Estimating optical flow output by a network using optical flow through a rigid registration moduleAnd the absolute scale depth D is obtained by calculating the parallax D output by the depth estimation network1,2To optimize the pose T output by the pose estimation network12Obtaining an optimized pose T'12。
During rigid registration, points in 2D image space are converted into 3D point clouds, the formula:
Qk(i,j)=Dk(i,j)K-1Pk(i,j),k=1,2
wherein, Pk(i, j) is the image LkThe homogeneous coordinates of the pixel at the (i, j) position of (a), K is a camera intrinsic parameter, Dk(i, j) is the image LkAbsolute scale depth, Q, at the (i, j) position of (a)k(i, j) is the image LkThe corresponding 3D coordinates of the pixel at the (i, j) position of (a);
by using position and orientation T12Converting 3D point cloud Q1 to 3D point cloud(Can be understood as being at t2L of time1Point cloud constructed from the 3D coordinates of the points in (1); and, using a bilinear sampling method, based on the optical flowPoint Q of 3D point2Deformation back to t1Obtaining corresponding 3D point cloud by timeThe correspondence is established by a deformation step such thatCorrespond to
Wherein W, H represents the width and height of the image, respectively;respectively representing the flow of lightComponents in the x, y axes;
if everything is very accurate, thenShould equal static and non-occluded areas of the sceneTherefore, use is made firstOf the opposite direction of the light flowEstimating a non-occluded area O1The pose estimate is then re-determined by tightly aligning the two non-occluded area point clouds. In particular, by minimizing the size of the selected region RAndthe improved posture Δ T is estimated by the distance between:
wherein the region R isAndthe top R% (e.g., 25%) of the minimum distance ordering between corresponding non-occluded regions; by doing so, it is attempted to exclude points in the moving area, since they tend to be inAndwith a greater distance therebetween. By combining T12And delta T can obtain an optimized pose T'12:
T′12=ΔT×T12。
2) Flow consistency and motion segmentation.
Through optimized pose T'12The formula that can calculate the rigid flow caused by camera motion is:
wherein K is camera reference P1Represents L1Homogeneous coordinates of the middle pixel;
if it is notAndthe sections are accurate, their values should match in the static area and differ in the moving area. In a rigid flowAnda consistency check is performed between, if the difference between the two rigid stream flows is greater than a threshold δ, the corresponding region is marked as a moving foreground M1And the rest of the image is marked as a static background M0So that the image loss mask is M1:
Due to O1Is composed ofLess accurate in occluded areas, which may lead to false positives, the default estimated motion area is located in the non-occluded area.
In static area ratioAnd is more accurate. Thus, use is made ofTo guide learningUsing the following flow consistency losscon:
Wherein, SG denotes the stop gradient,for rigid flow caused by camera motion, N is a normalization coefficient.
Based on the above, the total loss for the model shown in fig. 2 is:
in the above equation, λ is a weight coefficient of the corresponding loss term.
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. An unsupervised end-to-end driving environment perception method based on deep learning is characterized by comprising the following steps:
acquiring images by using a binocular camera, and preprocessing to obtain training data;
utilizing two continuous stereo images with the same size in training data to train an optical flow estimation network, a pose estimation network, a depth estimation network and motion segmentation;
after training is finished, carrying out rigid registration on two newly input continuous stereo image pairs with the same size by using output results of the three networks to optimize the output of a pose estimation network; and calculating rigid flow caused by the motion of the camera by using the output of the depth estimation network and the output of the optimized pose estimation network, and performing flow consistency check with the output of the optical flow estimation network so as to perform motion segmentation.
2. The unsupervised end-to-end driving environment perception method based on deep learning of claim 1, wherein the image acquisition using a binocular camera and the obtaining of the training data through preprocessing comprises:
firstly, zooming an original image acquired by a binocular camera, and simultaneously zooming internal parameters of the corresponding camera;
then, generating training data by a data enhancement method;
the data enhancement method comprises the following steps of performing data enhancement in one or more ways:
randomly correcting the input monocular image by using a brightness factor gamma;
scale factor sxAnd syZooming the image along an X axis and a Y axis, and then randomly cutting the image into a specified size;
randomly rotating the image by r degrees, and interpolating by using a nearest neighbor method;
random left-right turning and random time sequence switching.
3. The method as claimed in claim 1, wherein the training of the optical flow estimation network, the pose estimation network, the depth estimation network, and the motion segmentation by using two consecutive same-size stereo images in the training data comprises:
firstly, training an optical flow estimation network by using continuous stereo images with the same size in training data, and simultaneously training a pose estimation network and a depth estimation network;
then, the optical flow estimation network, the pose estimation network, the depth estimation network and the motion segmentation are trained simultaneously by using the continuous stereo image pairs with the same size in the training data.
4. The unsupervised end-to-end driving environment perception method based on deep learning of claim 3,
two consecutive stereo image pairs of the same size are denoted L1、R1、L2And R2(ii) a Wherein L is1、R1Corresponding representation at t1Left and right images of time, L2、R2Corresponding representation t2Left and right images of a time;
using two successive left images L1And L2And designed optical flow loss functionTraining an optical flow estimation network, the output of which is two continuous left images L with the same size1And L2Flow of light between
Training a pose estimation network and a depth estimation network simultaneously:
using two successive left images L1And L2And designed rigid flow loss functionTraining a pose estimation network, outputting the pose estimation network as two continuous left images L1And L2With relative camera pose T therebetween12(ii) a Using two successive pairs L of stereo images of the same size1、R1、L2And R2And loss of stereoTraining a depth estimation network, the output of which is the disparity d between stereo image pairs, using a stereo camera baseline B and a horizontal focal length fxCalculating the absolute scale depth D ═ Bf through the parallax DxD, recording the calculated absolute scale depth as D1,2。
5. The unsupervised end-to-end driving environment perception method based on deep learning of claim 4,
the optical flow loss functionThe method comprises the following steps: occlusion aware reconstruction loss termAnd a smoothing loss term
Where ψ (.) represents an occlusion aware reconstruction loss function, α represents an adjustment coefficient, O1Representing non-occluded areas, M1Representing a loss mask, N being a normalization coefficient;is represented by L1、L2Flow of light betweenAnd in combination with L2Reconstructed left image, notee denotes the natural logarithm, (i, j) denotes the pixel position,refers to the derivation operation along the x or y direction of the image, the square of which represents the derivation of the second order, a refers to the x or y direction of the image, indicating the direction of the derivation, β is the weight.
6. The unsupervised end-to-end driving environment perception method based on deep learning of claim 4,
where ψ () denotes an occlusion aware reconstruction loss function, O1Representing non-occluded areas, M1Representing a loss mask;according to rigid flowAnd in combination with L2Two reconstructed left images, notedBy absolute scale depth D1,2And pose T12The calculation results in that,by absolute scale depth D1,2And calculating the pose after optimization.
7. The method as claimed in claim 3, wherein the simultaneous training of the optical flow estimation network, the pose estimation network, the depth estimation network and the motion segmentation by using consecutive same-size stereo image pairs in the training data comprises:
two consecutive stereo image pairs of the same size are denoted L1、R1、L2And R2(ii) a Wherein L is1、R1Corresponding representation at t1Left and right images of time, L2、R2Corresponding representation t2Left and right images of a time;
using two successive pairs L of stereo images of the same size1、R1、L2And R2Optical flow lossLoss of stereo soundLoss of rigid flowAnd loss of flow consistencyAnd simultaneously training an optical flow estimation network, a pose estimation network, a depth estimation network, a rigid registration module and a flow consistency check module.
8. The method as claimed in claim 7, wherein the optical flow output by the network is estimated by a rigid registration module using optical flowAnd the absolute scale depth D is obtained by calculating the parallax D output by the depth estimation network1,2To optimize the pose T output by the pose estimation network12Obtaining an optimized pose T'12;
During rigid registration, points in 2D image space are converted into 3D point clouds, the formula:
Qk(i,j)=Dk(i,j)K-1Pk(i,j),k=1,2
wherein, Pk(i, j) is the image LkThe homogeneous coordinates of the pixel at the (i, j) position of (a), K is a camera intrinsic parameter, Dk(i, j) is the image LkAbsolute scale depth, Q, at the (i, j) position of (a)k(i, j) is the image LkThe corresponding 3D coordinates of the pixel at the (i, j) position of (a);
by using position and orientation T12Point Q of 3D point1Conversion to 3D point cloudAnd, using a bilinear sampling method, based on the optical flowPoint Q of 3D point2Deformation back to t1Obtaining corresponding 3D point cloud by timeThe correspondence is established by a deformation step such thatCorrespond to
Wherein W, H represents the width and height of the image, respectively;respectively representing the flow of lightComponents in the x, y axes;
by minimizing in the selected region RAndthe improved posture Δ T is estimated by the distance between:
wherein the region R isAndthe first R% of the minimum distance ordering between corresponding non-occluded regions;
thus obtaining an optimized pose T 'through the following formula'12:
T′12=ΔT×T12。
9. The unsupervised end-to-end driving environment perception method based on deep learning of claim 7 or 8, wherein the formula for calculating the rigid flow caused by camera motion is as follows:
wherein K is camera reference P1Represents L1Homogeneous coordinates of the middle pixel;
the loss mask is estimated by:
wherein, O1Representing an unobstructed area and δ is the threshold.
10. The unsupervised end-to-end driving environment perception method based on deep learning of claim 7 or 8, characterized by stream consistency lossExpressed as:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911345900.9A CN111105432B (en) | 2019-12-24 | 2019-12-24 | Unsupervised end-to-end driving environment perception method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911345900.9A CN111105432B (en) | 2019-12-24 | 2019-12-24 | Unsupervised end-to-end driving environment perception method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111105432A true CN111105432A (en) | 2020-05-05 |
CN111105432B CN111105432B (en) | 2023-04-07 |
Family
ID=70423494
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911345900.9A Active CN111105432B (en) | 2019-12-24 | 2019-12-24 | Unsupervised end-to-end driving environment perception method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111105432B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111627056A (en) * | 2020-05-14 | 2020-09-04 | 清华大学 | Depth estimation-based driving visibility determination method and device |
CN111629194A (en) * | 2020-06-10 | 2020-09-04 | 北京中科深智科技有限公司 | Method and system for converting panoramic video into 6DOF video based on neural network |
CN113140011A (en) * | 2021-05-18 | 2021-07-20 | 烟台艾睿光电科技有限公司 | Infrared thermal imaging monocular vision distance measurement method and related assembly |
CN113838104A (en) * | 2021-08-04 | 2021-12-24 | 浙江大学 | Registration method based on multispectral and multi-mode image consistency enhancement network |
CN113902807A (en) * | 2021-08-19 | 2022-01-07 | 江苏大学 | Electronic component three-dimensional reconstruction method based on semi-supervised learning |
CN114187581A (en) * | 2021-12-14 | 2022-03-15 | 安徽大学 | Driver distraction fine-grained detection method based on unsupervised learning |
CN114359363A (en) * | 2022-01-11 | 2022-04-15 | 浙江大学 | Video consistency depth estimation method and device based on deep learning |
CN114494332A (en) * | 2022-01-21 | 2022-05-13 | 四川大学 | Unsupervised estimation method for scene flow from synthesis to real LiDAR point cloud |
GB2618775A (en) * | 2022-05-11 | 2023-11-22 | Continental Autonomous Mobility Germany GmbH | Self-supervised learning of scene flow |
WO2024051184A1 (en) * | 2022-09-07 | 2024-03-14 | 南京逸智网络空间技术创新研究院有限公司 | Optical flow mask-based unsupervised monocular depth estimation method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097109A (en) * | 2019-04-25 | 2019-08-06 | 湖北工业大学 | A kind of road environment obstacle detection system and method based on deep learning |
US20190265712A1 (en) * | 2018-02-27 | 2019-08-29 | Nauto, Inc. | Method for determining driving policy |
CN110189278A (en) * | 2019-06-06 | 2019-08-30 | 上海大学 | A kind of binocular scene image repair method based on generation confrontation network |
CN110443843A (en) * | 2019-07-29 | 2019-11-12 | 东北大学 | A kind of unsupervised monocular depth estimation method based on generation confrontation network |
CN110490928A (en) * | 2019-07-05 | 2019-11-22 | 天津大学 | A kind of camera Attitude estimation method based on deep neural network |
CN110490919A (en) * | 2019-07-05 | 2019-11-22 | 天津大学 | A kind of depth estimation method of the monocular vision based on deep neural network |
WO2019223382A1 (en) * | 2018-05-22 | 2019-11-28 | 深圳市商汤科技有限公司 | Method for estimating monocular depth, apparatus and device therefor, and storage medium |
-
2019
- 2019-12-24 CN CN201911345900.9A patent/CN111105432B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190265712A1 (en) * | 2018-02-27 | 2019-08-29 | Nauto, Inc. | Method for determining driving policy |
WO2019223382A1 (en) * | 2018-05-22 | 2019-11-28 | 深圳市商汤科技有限公司 | Method for estimating monocular depth, apparatus and device therefor, and storage medium |
CN110097109A (en) * | 2019-04-25 | 2019-08-06 | 湖北工业大学 | A kind of road environment obstacle detection system and method based on deep learning |
CN110189278A (en) * | 2019-06-06 | 2019-08-30 | 上海大学 | A kind of binocular scene image repair method based on generation confrontation network |
CN110490928A (en) * | 2019-07-05 | 2019-11-22 | 天津大学 | A kind of camera Attitude estimation method based on deep neural network |
CN110490919A (en) * | 2019-07-05 | 2019-11-22 | 天津大学 | A kind of depth estimation method of the monocular vision based on deep neural network |
CN110443843A (en) * | 2019-07-29 | 2019-11-12 | 东北大学 | A kind of unsupervised monocular depth estimation method based on generation confrontation network |
Non-Patent Citations (3)
Title |
---|
周云成;许童羽;邓寒冰;苗腾;吴琼;: "基于自监督学习的番茄植株图像深度估计方法" * |
毕天腾;刘越;翁冬冬;王涌天;: "基于监督学习的单幅图像深度估计综述" * |
黄军;王聪;刘越;毕天腾;: "单目深度估计技术进展综述" * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111627056A (en) * | 2020-05-14 | 2020-09-04 | 清华大学 | Depth estimation-based driving visibility determination method and device |
CN111627056B (en) * | 2020-05-14 | 2023-09-01 | 清华大学 | Driving visibility determination method and device based on depth estimation |
CN111629194A (en) * | 2020-06-10 | 2020-09-04 | 北京中科深智科技有限公司 | Method and system for converting panoramic video into 6DOF video based on neural network |
CN111629194B (en) * | 2020-06-10 | 2021-01-26 | 北京中科深智科技有限公司 | Method and system for converting panoramic video into 6DOF video based on neural network |
CN113140011B (en) * | 2021-05-18 | 2022-09-06 | 烟台艾睿光电科技有限公司 | Infrared thermal imaging monocular vision distance measurement method and related components |
CN113140011A (en) * | 2021-05-18 | 2021-07-20 | 烟台艾睿光电科技有限公司 | Infrared thermal imaging monocular vision distance measurement method and related assembly |
CN113838104B (en) * | 2021-08-04 | 2023-10-27 | 浙江大学 | Registration method based on multispectral and multimodal image consistency enhancement network |
CN113838104A (en) * | 2021-08-04 | 2021-12-24 | 浙江大学 | Registration method based on multispectral and multi-mode image consistency enhancement network |
CN113902807A (en) * | 2021-08-19 | 2022-01-07 | 江苏大学 | Electronic component three-dimensional reconstruction method based on semi-supervised learning |
CN114187581A (en) * | 2021-12-14 | 2022-03-15 | 安徽大学 | Driver distraction fine-grained detection method based on unsupervised learning |
CN114187581B (en) * | 2021-12-14 | 2024-04-09 | 安徽大学 | Driver distraction fine granularity detection method based on unsupervised learning |
CN114359363A (en) * | 2022-01-11 | 2022-04-15 | 浙江大学 | Video consistency depth estimation method and device based on deep learning |
CN114494332A (en) * | 2022-01-21 | 2022-05-13 | 四川大学 | Unsupervised estimation method for scene flow from synthesis to real LiDAR point cloud |
CN114494332B (en) * | 2022-01-21 | 2023-04-25 | 四川大学 | Unsupervised synthesis to real LiDAR point cloud scene flow estimation method |
GB2618775A (en) * | 2022-05-11 | 2023-11-22 | Continental Autonomous Mobility Germany GmbH | Self-supervised learning of scene flow |
WO2024051184A1 (en) * | 2022-09-07 | 2024-03-14 | 南京逸智网络空间技术创新研究院有限公司 | Optical flow mask-based unsupervised monocular depth estimation method |
Also Published As
Publication number | Publication date |
---|---|
CN111105432B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111105432B (en) | Unsupervised end-to-end driving environment perception method based on deep learning | |
Shu et al. | Feature-metric loss for self-supervised learning of depth and egomotion | |
Mitrokhin et al. | EV-IMO: Motion segmentation dataset and learning pipeline for event cameras | |
US11315266B2 (en) | Self-supervised depth estimation method and system | |
Zhu et al. | Unsupervised event-based learning of optical flow, depth, and egomotion | |
CN111325794B (en) | Visual simultaneous localization and map construction method based on depth convolution self-encoder | |
Zhan et al. | Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction | |
CN109377530B (en) | Binocular depth estimation method based on depth neural network | |
US10818029B2 (en) | Multi-directional structured image array capture on a 2D graph | |
CN110782490B (en) | Video depth map estimation method and device with space-time consistency | |
TWI709107B (en) | Image feature extraction method and saliency prediction method including the same | |
WO2018000752A1 (en) | Monocular image depth estimation method based on multi-scale cnn and continuous crf | |
CN110689008A (en) | Monocular image-oriented three-dimensional object detection method based on three-dimensional reconstruction | |
CN111783582A (en) | Unsupervised monocular depth estimation algorithm based on deep learning | |
CN108491763B (en) | Unsupervised training method and device for three-dimensional scene recognition network and storage medium | |
WO2024051184A1 (en) | Optical flow mask-based unsupervised monocular depth estimation method | |
CN115035171B (en) | Self-supervision monocular depth estimation method based on self-attention guide feature fusion | |
CN113850900B (en) | Method and system for recovering depth map based on image and geometric clues in three-dimensional reconstruction | |
CN111950477A (en) | Single-image three-dimensional face reconstruction method based on video surveillance | |
Song et al. | Self-supervised depth completion from direct visual-lidar odometry in autonomous driving | |
CN113065506B (en) | Human body posture recognition method and system | |
CN116188550A (en) | Self-supervision depth vision odometer based on geometric constraint | |
Lee et al. | Globally consistent video depth and pose estimation with efficient test-time training | |
CN112634331A (en) | Optical flow prediction method and device | |
Su et al. | Omnidirectional depth estimation with hierarchical deep network for multi-fisheye navigation systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |