CN116151320A - Visual odometer method and device for resisting dynamic target interference - Google Patents
Visual odometer method and device for resisting dynamic target interference Download PDFInfo
- Publication number
- CN116151320A CN116151320A CN202211171786.4A CN202211171786A CN116151320A CN 116151320 A CN116151320 A CN 116151320A CN 202211171786 A CN202211171786 A CN 202211171786A CN 116151320 A CN116151320 A CN 116151320A
- Authority
- CN
- China
- Prior art keywords
- adjacent frames
- input images
- visual
- camera
- rotation matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000000007 visual effect Effects 0.000 title claims abstract description 96
- 238000000034 method Methods 0.000 title claims abstract description 53
- 239000011159 matrix material Substances 0.000 claims abstract description 86
- 238000013519 translation Methods 0.000 claims abstract description 80
- 238000001514 detection method Methods 0.000 claims abstract description 35
- 238000000605 extraction Methods 0.000 claims abstract description 11
- 238000005457 optimization Methods 0.000 claims description 10
- 230000008859 change Effects 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 5
- 238000012549 training Methods 0.000 description 15
- 230000009466 transformation Effects 0.000 description 14
- 238000004364 calculation method Methods 0.000 description 8
- 238000013527 convolutional neural network Methods 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000013459 approach Methods 0.000 description 3
- 238000000354 decomposition reaction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C22/00—Measuring distance traversed on the ground by vehicles, persons, animals or other moving solid bodies, e.g. using odometers, using pedometers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a visual odometer method and a visual odometer device for resisting dynamic target interference, wherein the method comprises the following steps: constructing a vehicle target detection network for detecting a vehicle; acquiring road images of adjacent frames output by a vision camera, respectively detecting target vehicles on the road images based on a vehicle target detection network, and respectively marking the identified target vehicles by adopting anchor frames; removing pixel contents in the anchor frame range in the road image respectively to serve as an input image processed by the visual odometer; the visual odometer acquires the input images of the adjacent frames, and adopts a characteristic extraction algorithm to extract characteristic points in the input images respectively; adopting a feature matching algorithm and carrying out feature point matching on the input images of the adjacent frames based on the feature points; and acquiring a rotation matrix and a translation vector between the input images of the adjacent frames based on the characteristic point matching result of the input images of the adjacent frames, and performing motion estimation on the vision camera.
Description
Technical Field
The invention relates to the field of automatic driving, in particular to a visual odometer method and a visual odometer device for resisting dynamic target interference.
Background
Visual Odometer (VO) is a process of estimating self-motion by a single camera or multiple cameras as input, and application fields encompass autopilot, robotics, unmanned aerial vehicles, augmented reality, etc. The VO concept was created by Nister in the 2004 landmark paper. This term is very similar to wheel ranging, and incrementally estimates the motion of the vehicle by integrating the number of images taken by the wheel. Also, the VO uses an on-board camera to detect image motion changes to enhance the estimation of carrier pose. In order for the VO to be more efficient, there must be enough ambient images that still images with enough texture can extract the motion features. In addition, sequential images need to be captured to overlap the scene.
VO has the advantage over vehicle wheel ranging that it is not affected by wheel slip in uneven ground or other adverse conditions. VO also provides a more accurate trajectory estimation with a relative position error in the range of 0.1% to 2%. This makes VO a beneficial complement to wheel ranging, as well as other navigation systems such as global satellite navigation system (Global Positioning System, GPS), inertial measurement unit (Inertia Measurement Unit, IMU) and radar ranging systems. VO is very important in environments where GPS fails, such as in urban tall buildings, tunnels or under water, in space.
The main modes of VO are classified into a characteristic point method and a direct method. The feature point method currently occupies the main stream, can work when the noise is larger and the camera moves faster, but the map is a sparse feature point; the direct method can build a dense map without extracting features, but has the defects of large calculation amount and poor robustness.
Under the conditions of rich textures, good illumination and no dynamic target interference, the VO system can achieve ideal precision and performance at present. However, in environments such as urban roads, due to the existence of a large number of dynamic targets such as vehicles and pedestrians, the extraction of VO feature points and the calculation of optical flow are obviously affected, so that errors are caused to the estimation of the carrier pose, and even the VO system is scattered and cannot work normally when serious. For example,Chinese patent CN 109813334A discloses a real-time high-precision vehicle mileage calculation method based on binocular vision, in which feature extraction and matching are performed by using directly acquired adjacent frame images, and in which the scheme is easily affected by objects such as dynamic vehicles.
Disclosure of Invention
The invention aims to provide a visual odometer method and a visual odometer device for resisting dynamic target interference.
In order to achieve the above object, the present invention provides a visual odometer method for resisting dynamic target interference, comprising:
s1, constructing a vehicle target detection network for detecting a vehicle;
s2, acquiring road images of adjacent frames output by a vision camera, respectively detecting target vehicles on the road images based on the vehicle target detection network, and respectively marking the identified target vehicles by adopting anchor frames;
s3, eliminating pixel contents in the anchor frame range in the road image respectively to serve as an input image processed by a visual odometer;
s4, the visual odometer acquires the input images of the adjacent frames, and adopts a feature extraction algorithm to extract feature points in the input images respectively;
s5, adopting a feature matching algorithm and carrying out feature point matching on the input images of the adjacent frames based on the feature points;
s6, based on the characteristic point matching result of the input images of the adjacent frames, acquiring a rotation matrix and a translation vector between the input images of the adjacent frames, and performing motion estimation on the vision camera.
According to one aspect of the present invention, in step S6, based on a feature point matching result of the input images of adjacent frames, a rotation matrix and a translation vector between the input images of adjacent frames are acquired, and the step of performing motion estimation on the visual camera includes:
constructing a relative pose relationship when the visual camera captures the road image of the adjacent frame based on the rotation matrix and the translation vector;
acquiring absolute pose tracks of the visual camera when capturing an initial frame road image based on the relative pose relation;
and performing motion estimation on the visual camera based on the absolute pose track.
According to one aspect of the invention, the step of acquiring absolute pose tracks of the visual camera with respect to capturing an initial frame road image based on the relative pose relationship comprises:
acquiring the relative pose change of the visual camera corresponding to the road image of the adjacent frame based on the relative pose relation;
acquiring an absolute pose of the visual camera when capturing the road image of each frame based on the pose of the visual camera when capturing the road image of an initial frame and based on the relative pose change;
and based on the absolute pose of the vision camera, sequentially connecting according to a time sequence, and performing incremental pose track reconstruction to acquire the absolute pose track.
According to one aspect of the present invention, in the step of acquiring absolute pose tracks of the visual camera with respect to capturing an initial frame road image based on the relative pose relationship, the method further comprises:
extracting a local track in the absolute pose track to perform local track optimization; wherein, include:
extracting track segments containing m absolute pose of the vision camera from the absolute pose track as the local track;
and iteratively calculating the minimum value of the sum of the 3D point cloud re-projection error squares of the road image corresponding to the m absolute poses in the local track, and completing the optimization of the local track.
According to one aspect of the invention, in the step of constructing a relative pose relationship when the visual camera captures the road image of adjacent frames based on the rotation matrix and the translation vector, the relative pose relationship is expressed as:
wherein R is k,k-1 Is a rotation matrix from the image coordinate at time k-1 to the image coordinate at time k, t k,k-1 Is a translation vector.
According to one aspect of the invention, the vision camera employs a monocular, binocular or depth camera.
According to one aspect of the invention, if the vision camera adopts a monocular camera, the rotation matrix and the translation vector are obtained in a 2D-to-2D mode, or the rotation matrix and the translation vector are obtained in a 3D-to-2D mode;
if the rotation matrix and the translation vector are obtained in a 2D-to-2D mode, in step S5, feature point matching is performed by adopting the input images of two adjacent frames;
in step S6, the step of obtaining the rotation matrix and the translation vector between the input images of the adjacent frames based on the feature point matching result of the input images of the adjacent frames includes:
acquiring an intrinsic matrix of the input image of the adjacent frame based on the characteristic point matching result;
decomposing the rotation matrix and the translation vector based on the eigenvector;
if the rotation matrix and the translation vector are obtained in a 3D-to-2D manner, in step S5, feature point matching is performed by using the input images of three adjacent frames, which includes:
triangularizing the input images by adopting the first two frames to obtain a 3D image;
performing feature point matching based on the 3D image and the input image of a third frame;
in step S6, in the step of obtaining the rotation matrix and the translation vector between the input images of the adjacent frames based on the feature point matching result of the input images of the adjacent frames, the rotation matrix and the translation vector are obtained by adopting PnP algorithm based on the feature point matching result.
According to one aspect of the invention, if the vision camera adopts a binocular camera or a depth camera, the rotation matrix and the translation vector are acquired in a 3D-3D manner, or the rotation matrix and the translation vector are acquired in a 3D-2D manner;
if the rotation matrix and the translation vector are obtained in a 3D-3D mode, in step S5, feature point matching is respectively carried out on the input images of two adjacent frames with left eyes and the input images of two adjacent frames with right eyes; or, matching the input images of the left eye and the right eye at the same moment to generate 3D matching images, and matching the characteristic points based on the 3D matching images of the adjacent frames;
in step S6, the step of obtaining the rotation matrix and the translation vector between the input images of the adjacent frames based on the feature point matching result of the input images of the adjacent frames includes:
triangularizing the matched characteristic points based on the characteristic point matching result;
acquiring 3D features based on the triangulated feature points, and calculating the rotation matrix and the translation vector based on the 3D features;
if the rotation matrix and the translation vector are acquired in a 3D-to-2D manner, step S5 includes:
triangularizing the input images of the left eye and the right eye at the previous moment to obtain a 3D image;
performing feature point matching based on the 3D image and the input image of the left eye or the right eye at the later moment;
in step S6, in the step of obtaining the rotation matrix and the translation vector between the input images of the adjacent frames based on the feature point matching result of the input images of the adjacent frames, the rotation matrix and the translation vector are obtained by adopting PnP algorithm based on the feature point matching result.
In order to achieve the above object, the present invention provides a visual odometer device adopting the above visual odometer method, comprising:
the visual camera is used for acquiring an external road image;
the vehicle target detection module is used for receiving the road image output by the visual camera, detecting a target vehicle and marking an anchor frame;
the image processing module is used for eliminating pixels in the anchor frame range in the road image and used as an input image processed by the visual odometer;
and the visual odometer module is used for matching the input images of the adjacent frames and outputting characteristic point matching results, and acquiring a rotation matrix and a translation vector between the input images of the adjacent frames based on the characteristic point matching results so as to perform motion estimation on the visual camera.
According to one aspect of the invention, the vision camera is a monocular camera, a binocular camera, or a depth camera.
According to the scheme of the invention, by detecting and removing objects such as vehicles, pedestrians and the like in the image, the possible dynamic target interference influence is reduced, and the accuracy and the robustness of the visual odometer in application scenes such as urban roads and the like are improved。
According to one aspect of the invention, the invention uses algorithms for functional implementation independent ofExternal parts such as laser radar
The sensor has lower application cost. Compared with other dynamic target inhibition algorithms, the method can more thoroughly eliminate potential dynamic states
The target interference and the effect are more obvious.
According to one scheme of the invention, the invention is suitable for the application of different types of vision cameras and has extremely high adaptability
Usability and application prospect.
Drawings
FIG. 1 schematically illustrates a block diagram of steps of a visual odometry method according to an embodiment of the invention;
FIG. 2 schematically illustrates a flow chart of a visual odometry method according to an embodiment of the invention;
FIG. 3 schematically illustrates an original image taken by a vision camera in a vision odometry method in accordance with one embodiment of the present invention;
FIG. 4 schematically illustrates a vehicle target identified and detected using a vehicle target detection network in a visual odometry method according to an embodiment of the invention;
FIG. 5 schematically illustrates an input image for use in a visual odometer calculation after rejection of a vehicle target in an anchor frame in a visual odometer method in accordance with an embodiment of the invention;
FIG. 6 schematically illustrates a diagram of tracking an environmental image and its features by camera movement in a visual odometry method according to an embodiment of the invention;
FIG. 7 schematically illustrates a diagram of epipolar constraint geometry in a visual odometry method according to one embodiment of the invention.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
In describing embodiments of the present invention, the terms "longitudinal," "transverse," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer" and the like are used in terms of orientation or positional relationship based on that shown in the drawings, which are merely for convenience of description and to simplify the description, rather than to indicate or imply that the devices or elements referred to must have a specific orientation, be constructed and operate in a specific orientation, and thus the above terms should not be construed as limiting the present invention.
Referring to fig. 1 and 2, according to an embodiment of the present invention, a visual odometer method for resisting dynamic target interference includes:
s1, constructing a vehicle target detection network for detecting a vehicle;
s2, acquiring road images of adjacent frames output by the vision camera, respectively detecting target vehicles on the road images based on a vehicle target detection network, and respectively marking the identified target vehicles by adopting anchor frames;
s3, eliminating pixel contents in an anchor frame range in the road image respectively to serve as an input image processed by the visual odometer;
s4, acquiring input images of adjacent frames by a visual odometer, and respectively extracting characteristic points in the input images by a characteristic extraction algorithm;
s5, adopting a feature matching algorithm and carrying out feature point matching on the input images of the adjacent frames based on the feature points;
s6, based on the characteristic point matching result of the input images of the adjacent frames, a rotation matrix and a translation vector between the input images of the adjacent frames are obtained and used for carrying out motion estimation on the vision camera.
According to an embodiment of the present invention, in step S1, the step of constructing a vehicle object detection network for detecting a vehicle includes:
s11, acquiring and loading a vehicle target detection data set; in the present embodiment, a large open source data set BBD 100K is used. It is a large diverse dataset commonly used for autopilot applications, annotated with more than 100000 images, category containing buses, pedestrians, bicycles, trucks, cars, trains, and riders, etc. Of course, in this embodiment, instead of using the existing public data set, a custom data set may be used for training, but the pictures in the custom data set need to be manually marked and converted into a required input format.
In this embodiment, the acquired vehicle target detection data set is divided into two parts, and one part of the data set for training the deep learning target detection network may be referred to as a "training set"; one is a data set for evaluating the deep learning object detection network, which may be referred to as a "test set".
S12, building and training a vehicle target detection network based on deep learning; in this embodiment, a deep learning target detection algorithm, such as a regional convolutional neural network (Region Convolutional Neural Networks, RCNN), single-stage multi-frame detection (Single Shot MultiBox Detector, SSD), YOLO (You Only Look Once), a spatial pyramid pooling layer (Spatial Pyramid Pooling, SPP), a feature pyramid (Feature Pyramid Networks, FPN), retinaNet, and the like, is used to build a deep learning target detection network.
Taking the YOLOv5 algorithm as an example, the target detection network consists of a feature extraction network and a detection network. The feature extraction network is typically a pre-trained convolutional neural network (Convolutional Neural Networks, CNN), although other pre-trained networks may be used. In contrast to the feature extraction network, the detection network is a small CNN, which consists of several convolution layers and layers specific to YOLO v 5. When the YOLO v5 target detection network is created, parameters such as input size, anchor frame number, feature extraction network and the like of the target detection network are correspondingly designed according to actual needs.
S13, data enhancement is carried out on the training set in the process of the vehicle target detection network trained in the step S12 so as to further enhance the training process; in this embodiment, the data enhancement is to improve accuracy of the network training by randomly converting the original data in the training set during the training process. The size of the training set is further extended by using data enhancement without increasing the number of training samples actually marked. In this embodiment, the training set for training may be enhanced by randomly flipping the image and associated frame tags horizontally.
S14, evaluating a trained vehicle target detection network; in this embodiment, the trained vehicle target detection network is evaluated using a test set to test its performance. Specifically, some general metrics such as average accuracy (Mean Average Precision, mAP) and log-average miss rate (Log Average Miss Rate, LAMR) may be calculated. For example, when mAP is used to evaluate performance, the average accuracy includes the ability of the vehicle object detection network to make the correct classification (accuracy) and the ability of the detector to find all relevant objects (recall). The precision/recall (PR) curve highlights the precision of the detector at different recall levels, ideally 1 at each point. Therefore, to improve the average accuracy, more training set data may be used to improve the training effect.
As shown in fig. 1,2, 3 and 4, in step S2, the steps of acquiring road images of adjacent frames output by the vision camera, respectively detecting target vehicles on the road images based on the vehicle target detection network, and respectively marking the identified target vehicles by using anchor frames include:
s21, continuously shooting road images in the travelling direction based on a vision camera fixed on a carrier (or intercepting the road images based on video streams), and outputting the acquired road images to a vehicle target detection network running on the upper line according to time sequence;
s22, sequentially acquiring corresponding road images by a vehicle target detection network and detecting the vehicle target;
s23, respectively performing anchor frame marking on the vehicle targets in the road image based on the vehicle target detection result output by the vehicle target detection network.
As shown in fig. 5, in step S3, in the step of removing the pixel content within the anchor frame range in the road image as the input image of the visual odometer processing, the anchor frame of the vehicle target in the road image is removed based on the foregoing steps, and the pixels including the vehicle target in the anchor frame may be deleted or masked by using the anchor frame as a boundary, and used as the input image of the visual odometer.
With the above arrangement, since the vehicle target is no longer present in the road picture, the image feature points recognized and extracted by the visual odometer are no longer present on the vehicle target, thereby eliminating the potential disturbing influence of the moving vehicle.
As shown in fig. 5, in step S4, in the step of acquiring the input images of the adjacent frames by the visual odometer and extracting the feature points in the input images by the feature extraction algorithm, first, a representative point, that is, a feature point, is selected from the input images (i.e., the images obtained by extracting the vehicle target in the anchor frame in the foregoing step). The feature points are kept unchanged on the premise that the camera visual angle changes slightly, and then the same feature points can be extracted from all the input images. Then, on the basis of the feature points, the pose estimation of the visual camera and the positioning problem of the feature points can be carried out through corresponding feature point matching. In the embodiment, one of a SIFT algorithm, a SURF algorithm and an ORB algorithm is adopted to extract the feature points, so that the obtained feature points have better repeatability and distinguishability, and higher efficiency and locality.
According to one embodiment of the present invention, after the feature points in the adjacent frame images are calculated and extracted, the feature points need to be matched. Specifically, the problem of data association in the visual inertial odometer is solved through feature point matching, namely, the corresponding relation between the currently seen feature point and the previously seen feature point is determined. By accurately matching the descriptors between the images or between the images and the map, a great deal of burden can be reduced for subsequent operations such as gesture estimation, optimization and the like. In step S5, the feature matching algorithm is adopted to carry out special processing on the input images of the adjacent frames based on the feature pointsIn the step of symptom matching, the input images at two adjacent moments k, k+1 are considered, if in image I k Extracting the characteristic pointsm=1, 2, …, M, in image I k+1 Extracting the characteristic point x n(k+1) N=1, 2, …, N, matching can be performed using Brute-Force matching (Brute-Force match). I.e. for each feature point +.>And all x n(k+1) The distance of the descriptors (descriptors) is measured, and then the sequence is performed, and the nearest feature point is taken as the matching point. The descriptor distance represents the similarity between two features, and different distance measurement norms can be adopted in practical application. In the present embodiment, when the number of feature points is large, the calculation amount of the violent matching method is large, particularly when matching an input image of a certain frame with one map. This may lead to computational delays, which make it difficult to meet real-time requirements. In this case, the fast approximate nearest neighbor (Fast Library for Approximate Nearest Neighbors, FLANN) algorithm may be adopted to be more suitable for the case that the number of matching points is larger.
As shown in fig. 6, in step S6, based on the feature point matching result of the input images of the adjacent frames, a rotation matrix and a translation vector between the input images of the adjacent frames are obtained, so as to perform motion estimation on the vision camera, which includes:
constructing a relative pose relationship when the visual camera captures road images of adjacent frames based on the rotation matrix and the translation vector; in the present embodiment, the coordinates of the image frames of the vision camera are assumed to be the coordinates of the carrier. If the vision camera is a binocular camera or a depth camera, the left-purpose coordinate is set as the origin without losing generality. In the present embodiment, the relative pose relationship T of adjacent camera positions k,k-1 (or camera system position) by visual features, each frame of input image is then corresponding based on the relative pose relationshipTo obtain an absolute pose C when k=0 with respect to the initial coordinate frame k 。
In the present embodiment, the relative relationship of the vision camera positions at successive times k-1 and k is changed by the transformation T k,k-1 ∈R 4×4 The relative pose relationship is then obtained as:
wherein R is k,k-1 Is a rotation matrix from the image coordinate at time k-1 to the image coordinate at time k, t k,k-1 Is a translation vector.
Further, absolute pose tracks of the visual camera with respect to capturing the initial frame road image are acquired based on the relative pose relationship; wherein, include:
acquiring the relative pose change of the visual camera corresponding to the road image of the adjacent frame based on the relative pose relation; in the present embodiment, the vector group T of the relative pose change of the continuous frame input image can be obtained based on the aforementioned relative pose relationship 1:n ={T 1:0 ,…,T n:n-1 The vector set includes the motion of successive vision cameras. For simplicity, T may be k,k-1 Denoted as T k 。
Acquiring the absolute pose of the visual camera when capturing each frame of road image based on the pose of the visual camera when capturing the initial frame of road image and based on the relative pose change; in the present embodiment, the visual camera pose is set to C 0:n ={C 0 ,…,C n And the pose of the vision camera at the initial coordinate k=0 is included. Further, the current pose C n By simultaneous transformation T k (k=1, …, n) and therefore cn=c n-1 T n Wherein C 0 Is the pose of the camera k=0, and can be arbitrarily specified by the user.
The absolute pose of the vision camera is sequentially connected according to a time sequence, and incremental pose track reconstruction is carried out to obtain absolute pose tracks. In the present embodiment, the pose of the camera based on the acquired vision is C 0:n The motion trail (i.e. absolute pose trail) of the visual camera can be reconstructed by an incremental mode of sequentially connecting adjacent poses.
And performing motion estimation on the visual camera based on the absolute pose track.
According to one embodiment of the present invention, the step of acquiring absolute pose tracks of the visual camera with respect to capturing the initial frame road image based on the relative pose relationship further comprises:
extracting a local track in the absolute pose track to perform local track optimization; wherein, include:
extracting track segments containing the absolute pose of m visual cameras from the absolute pose track as local tracks;
and iteratively calculating the minimum value of the sum of the 3D point cloud re-projection errors of the road images corresponding to the m absolute poses in the local track (called windowed bundle set adjustment (Boundle Adjustment, BA) because the window bundle set adjustment is performed on the sub-windows on the m frame images), and completing the optimization of the local track.
According to one embodiment of the invention, the vision camera employs a monocular, binocular or depth camera.
According to one embodiment of the invention, if the vision camera adopts a monocular camera, the rotation matrix and the translation vector are obtained in a 2D-to-2D mode, or the rotation matrix and the translation vector are obtained in a 3D-to-2D mode;
if the rotation matrix and the translation vector are acquired in a 2D-to-2D manner, wherein two sets of corresponding features f at the time points k-1 and k k-1 And f k Both the coordinates are 2D images (2-dimensional input images), in step S5, feature point matching is performed by adopting the input images of two adjacent frames;
in step S6, the step of obtaining the rotation matrix and the translation vector between the input images of the adjacent frames based on the feature point matching result of the input images of the adjacent frames includes:
based on the feature point matching result, acquiring an intrinsic matrix of an input image of an adjacent frame;
decomposing a rotation matrix and a translation vector based on the eigenvector;
if a 3D-2D mode is adopted to acquire a rotation matrix and a translation vector, wherein the moment k-1 corresponds to the characteristic f k-1 For 3D (3-dimensional input image), time k corresponds to feature f k Is a 2D image (2-dimensional input image) coordinate, in step S5, feature point matching is performed using input images of three adjacent frames, which includes:
triangularizing the first two frames of input images to obtain a 3D image;
performing feature point matching based on the 3D image and the third frame input image;
in step S6, the rotation matrix and the translation vector between the input images of the adjacent frames are obtained based on the feature point matching result and the PnP algorithm is adopted to obtain the rotation matrix and the translation vector based on the feature point matching result.
According to one embodiment of the invention, if the vision camera adopts a binocular camera or a depth camera, acquiring a rotation matrix and a translation vector in a 3D-to-3D manner, or acquiring the rotation matrix and the translation vector in a 3D-to-2D manner;
if the rotation matrix and the translation vector are acquired in a 3D-to-3D manner, two sets of corresponding features f at the time points k-1 and k k-1 And f k Are 3D image coordinates. In step S5, feature point matching is performed by using the input images of the two adjacent frames of the left eye and the input images of the two adjacent frames of the right eye respectively; or, matching is carried out on the basis of the left-eye input image and the right-eye input image at the same moment to generate a 3D matching image, and characteristic point matching is carried out on the basis of the 3D matching images of the adjacent frames;
in step S6, the step of obtaining the rotation matrix and the translation vector between the input images of the adjacent frames based on the feature point matching result of the input images of the adjacent frames includes:
triangularizing the matched characteristic points based on the characteristic point matching result;
acquiring 3D features based on the triangulated feature points, and calculating a rotation matrix and a translation vector based on the 3D features;
if the rotation matrix and the translation vector are acquired in a 3D-to-2D manner, step S5 includes:
triangularizing the left-eye and right-eye input images at the previous moment to obtain a 3D image;
performing feature point matching based on the 3D image and the input image of the left or right destination at the later moment;
in step S6, the rotation matrix and the translation vector between the input images of the adjacent frames are obtained based on the feature point matching result and the PnP algorithm is adopted to obtain the rotation matrix and the translation vector based on the feature point matching result.
To further illustrate the present invention, the workflow of the present invention is further exemplified.
Example 1
The vision camera adopts a monocular camera and adopts a 2D to 2D mode to acquire a rotation matrix and a translation vector,The method specifically comprises the following steps:
1) Acquiring a new image frame I k ;
2) Extracting adjacent frame I k-1 And I k Inputting characteristic points of an image;
3) For adjacent frame I k-1 And I k Performing feature matching on the input image;
4) Calculating adjacent frame I k-1 And I k Inputting an eigenvalue matrix of the image; in this embodiment, the geometric relationship between input images by feature matching of adjacent frames is represented by an eigen matrix E. Wherein the vision camera motion parameters contained in the eigenmatrix E have an unknown translational transformation factortAnd can be further expressed as:
wherein the symbols areTo the right of the expression equation is scalar multiplication, (t) x ,t y ,t z ) For translation vector t k Components in three coordinate directions.
In the present embodiment, the most important feature for implementing the subsequent motion estimation based on the 2D-to-2D manner is epipolar constraint, which forms a straight line connecting two corresponding feature points in two imagesAnd->As shown in fig. 7. The epipolar constraint can be determined by the equation->Deriving, wherein->Is one of the images I k Is characterized by (a)>Is another image I k-1 The location of the corresponding feature. For simplicity, we mark their normalized sitting as:
In the present embodiment, 2D is calculated using epipolar constraintsTo the eigen matrix of the 2D feature matching. Wherein the method comprises the steps of,The minimization scheme is to use 5 2D to 2D correspondences. Here, non-coplanar points with n being greater than or equal to 8 are selected, and direct calculation is performed by adopting an 8-point algorithm of Longuet-Higgins. Each pair of feature matches gives a constraint equation:
wherein E= [ E 1 e 2 e 3 e 4 e 5 e 6 e 7 e 8 e 9 ] T ;
These constraint equations based on the 8-point method can form the following linear equation set:
AE=0
wherein,,
The eigenvalue matrix E can be obtained by solving the homogeneous linear system of equations above by singular value decomposition (Singular Value Decomposition, SVD). If more than 8 points can generate an overdetermined equation set (equation set with the number of equations being greater than the number of unknowns), if the condition (limitation) given by the overdetermined equation set is too strict, and the solution is not existed, fitting is carried out by a least square method, and a least square solution is obtained. The SVD form of matrix a is a=usv T Extracting the value diag (S) on the main diagonal = { S,0}, the first and second singular values being equal, the third being 0. In order to obtain an effective eigenvalue E that satisfies the constraint, it needs to be projected into the space with the effective eigenvalue. Projection eigen matrixThe method comprises the following steps:
when the points in 3D space are coplanar, the 8-point algorithm scheme degrades. Accordingly, a 5-point algorithm may be applied to calculate the coplanar points. It should be noted that the 8-point algorithm is applicable to both calibrated (perspective or panoramic) cameras and non-calibrated cameras, and the 5-point algorithm is applicable only to calibrated (perspective or panoramic) cameras.
5) Decomposing the eigenvalue matrix into a rotation matrix R k And translation vector t k And based on a rotation matrix R k And translation vector t k Forming a relative pose relationship T of a vision camera position k The method comprises the steps of carrying out a first treatment on the surface of the In the present embodiment, the eigenvalue matrix estimated from the previous stepIn this step, the rotation matrix R and the translation matrix t can be extracted. Typically, there are 4 different methods to solve R, t for the same eigenvector; in fact, a pair of R and +.Can be found by triangulating a point>
6) Calculating the correlation scale and readjusting the translation vector t accordingly k The method comprises the steps of carrying out a first treatment on the surface of the In this embodiment, with the estimated R, t as the initial value, the rotation and translation parameters can be optimized using a nonlinear optimization method. On this basis, in order to reconstruct the trajectory of the image sequence, a further different transformation T is required 0:n And (5) performing connection. Due to the flattening of the two imagesThe absolute dimensions of the shifts cannot be calculated, and for this purpose, their relative dimensions need to be calculated in order to achieve the above objective. The relative scale of the image subset transformations can be calculated. One approach is to triangulate a pair of image sets X of two image subsets k-1 And X k Is a 3-dimensional point of (c). The distance between two 3-dimensional points can be calculated from the corresponding 3-dimensional point coordinates. Computing image pairs X k-1 And X k The ratio r of the distances between them can be used to derive the corresponding scale. With the estimated R, t as initial value, the rotation and translation parameters can be optimized using a nonlinear optimization method. On this basis, in order to reconstruct the trajectory of the image sequence, a further different transformation T is required 0:n And (5) performing connection. Since the absolute scale of translation of the two images cannot be calculated, to achieve this, it is necessary to calculate their relative scale. The relative scale of the image subset transformations can be calculated. One approach is to triangulate a pair of image sets X of two image subsets k-1 And X k Is a 3-dimensional point of (c). The distance between two 3-dimensional points can be calculated from the corresponding 3-dimensional point coordinates. Computing image pairs X k-1 And X k The distance ratio r between the two can obtain the corresponding scale; wherein, the distance ratio is expressed as:
in view of robustness, redundant scale factors can be calculated and an average value employed; if outliers occur, the median is taken. Translation vector t k It can also be calculated from this distance ratio r. Calculation of the relative scale requires that features on a plurality of image frames have been matched (or tracked), at least three image frames.
7) By calculating C k =C k-1 T k The method comprises the steps of connecting pose transformation between input image frames to obtain an absolute pose track of a visual camera;
8) Repeating from step 1).
Example 2
The vision camera adopts binocular camera or depthA camera, and acquiring a rotation matrix and a translation vector in a 3D-3D mode; for 3D to 3D feature correspondence, the camera motion T may be calculated by a consistent transformation of the two sets of 3D features k . The feature correspondence of 3D to 3D is only applicable in stereoscopic vision. The method specifically comprises the following steps:
1) Acquiring image pairs I of two adjacent frames l,k-1 、I r,k-1 And I l,k 、I r,k ;
2) Extracting adjacent frame I l,k-1 、I r,k-1 And I l,k 、I r,k Inputting characteristic points of an image;
3) For adjacent frame I l,k-1 、I r,k-1 And I l,k 、I r,k Performing feature matching on the input image;
4) Triangulating the matched features for each image pair;
5) From 3D feature X k-1 And X k Acquiring rotation matrix R k And translation vector t k And based on a rotation matrix R k And translation vector t k Forming the relative pose relation of the position of the vision camera, and calculating T k The method comprises the steps of carrying out a first treatment on the surface of the In the present embodiment, T is calculated k The general approach of (1) is to calculate the minimum of the L2 distance between two sets of 3D features:
wherein i represents the i-th feature,representing 3D feature points->Is a homogeneous coordinate of (c).
Using 3D features X of more than 3 pairs k-1 、X k Calculate T k . Specifically, the translation vector t k Calculated by the following formula:
wherein the superscript "-" denotes an arithmetic mean.
Rotation matrix R k Can be calculated by Singular Value Decomposition (SVD):
R k =VU T
since the transformation calculation of the 3D to 3D correspondence has an absolute scale, the trajectory of the image sequence can be obtained directly by connecting the individual transformation processes.
6) By calculating C k =C k-1 T k Connecting pose transformation between image frames;
7) Repeating from step 1).
Example 3
The vision camera adopts a monocular camera, a binocular camera or a depth camera, and adopts a 3D to 2D mode to acquire a rotation matrix and a translation vector; in this embodiment, a PnP (periodic-n-Point) method is used to solve for 3D to 2D Point-to-Point motion. For example, P3P with 3-point estimated pose, direct linear transformation (Direct Linear Transformation, DLT), EPnP (Efficient PnP), UPnP, etc. In addition, a nonlinear optimization mode can be used for constructing a least square problem and solving the least square problem iteratively, namely BA.
According to an embodiment of the present invention, a visual odometer device for the aforementioned visual odometer method of the present invention includes:
the visual camera is used for acquiring an external road image;
the vehicle target detection module is used for receiving the road image output by the visual camera, detecting a target vehicle and marking an anchor frame;
the image processing module is used for eliminating pixels in the anchor frame range in the road image and is used as an input image processed by the visual odometer;
the visual odometer module is used for matching input images of adjacent frames and outputting characteristic point matching results, and acquiring a rotation matrix and a translation vector between the input images of the adjacent frames based on the characteristic point matching results, so as to perform motion estimation on the visual camera.
As shown in fig. 1, according to one embodiment of the present invention, the vision camera is a monocular camera, a binocular camera, or a depth camera.
The foregoing is merely exemplary of embodiments of the invention and, as regards devices and arrangements not explicitly described in this disclosure, it should be understood that this can be done by general purpose devices and methods known in the art.
The above description is only one embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A visual odometer method of combating dynamic target disturbances, comprising:
s1, constructing a vehicle target detection network for detecting a vehicle;
s2, acquiring road images of adjacent frames output by a vision camera, respectively detecting target vehicles on the road images based on the vehicle target detection network, and respectively marking the identified target vehicles by adopting anchor frames;
s3, eliminating pixel contents in the anchor frame range in the road image respectively to serve as an input image processed by a visual odometer;
s4, the visual odometer acquires the input images of the adjacent frames, and adopts a feature extraction algorithm to extract feature points in the input images respectively;
s5, adopting a feature matching algorithm and carrying out feature point matching on the input images of the adjacent frames based on the feature points;
s6, based on the characteristic point matching result of the input images of the adjacent frames, acquiring a rotation matrix and a translation vector between the input images of the adjacent frames, and performing motion estimation on the vision camera.
2. The method according to claim 1, wherein in step S6, based on the feature point matching result of the input images of the adjacent frames, the step of obtaining a rotation matrix and a translation vector between the input images of the adjacent frames for motion estimation of the vision camera includes:
constructing a relative pose relationship when the visual camera captures the road image of the adjacent frame based on the rotation matrix and the translation vector;
acquiring absolute pose tracks of the visual camera when capturing an initial frame road image based on the relative pose relation;
and performing motion estimation on the visual camera based on the absolute pose track.
3. The visual odometry method of claim 2, wherein the step of obtaining absolute pose trajectories of the visual camera with respect to capturing an initial frame of road image based on the relative pose relationship comprises:
acquiring the relative pose change of the visual camera corresponding to the road image of the adjacent frame based on the relative pose relation;
acquiring an absolute pose of the visual camera when capturing the road image of each frame based on the pose of the visual camera when capturing the road image of an initial frame and based on the relative pose change;
and based on the absolute pose of the vision camera, sequentially connecting according to a time sequence, and performing incremental pose track reconstruction to acquire the absolute pose track.
4. A visual odometry method according to claim 3, wherein the step of obtaining absolute pose trajectories of the visual camera with respect to capturing an initial frame of road image based on the relative pose relationship further comprises:
extracting a local track in the absolute pose track to perform local track optimization; wherein, include:
extracting track segments containing m absolute pose of the vision camera from the absolute pose track as the local track;
and iteratively calculating the minimum value of the sum of the 3D point cloud re-projection error squares of the road image corresponding to the m absolute poses in the local track, and completing the optimization of the local track.
5. The visual odometer method of claim 4, wherein in the step of constructing a relative pose relationship for the visual camera capturing the road image of adjacent frames based on the rotation matrix and the translation vector, the relative pose relationship is expressed as:
wherein R is k,k-1 Is a rotation matrix from the image coordinate at time k-1 to the image coordinate at time k, t k,k-1 Is a translation vector.
6. The visual odometry method of claim 5, wherein the visual camera is a monocular, binocular or depth camera.
7. The method of claim 6, wherein if the vision camera is a monocular camera, the rotation matrix and translation vector are obtained in a 2D-to-2D manner, or the rotation matrix and translation vector are obtained in a 3D-to-2D manner;
if the rotation matrix and the translation vector are obtained in a 2D-to-2D mode, in step S5, feature point matching is performed by adopting the input images of two adjacent frames;
in step S6, the step of obtaining the rotation matrix and the translation vector between the input images of the adjacent frames based on the feature point matching result of the input images of the adjacent frames includes:
acquiring an intrinsic matrix of the input image of the adjacent frame based on the characteristic point matching result;
decomposing the rotation matrix and the translation vector based on the eigenvector;
if the rotation matrix and the translation vector are obtained in a 3D-to-2D manner, in step S5, feature point matching is performed by using the input images of three adjacent frames, which includes:
triangularizing the input images by adopting the first two frames to obtain a 3D image;
performing feature point matching based on the 3D image and the input image of a third frame;
in step S6, in the step of obtaining the rotation matrix and the translation vector between the input images of the adjacent frames based on the feature point matching result of the input images of the adjacent frames, the rotation matrix and the translation vector are obtained by adopting PnP algorithm based on the feature point matching result.
8. The visual odometer method according to claim 6, wherein if the visual camera is a binocular camera or a depth camera, the rotation matrix and translation vector are obtained in a 3D to 3D manner or in a 3D to 2D manner;
if the rotation matrix and the translation vector are obtained in a 3D-3D mode, in step S5, feature point matching is respectively carried out on the input images of two adjacent frames with left eyes and the input images of two adjacent frames with right eyes; or, matching the input images of the left eye and the right eye at the same moment to generate 3D matching images, and matching the characteristic points based on the 3D matching images of the adjacent frames;
in step S6, the step of obtaining the rotation matrix and the translation vector between the input images of the adjacent frames based on the feature point matching result of the input images of the adjacent frames includes:
triangularizing the matched characteristic points based on the characteristic point matching result;
acquiring 3D features based on the triangulated feature points, and calculating the rotation matrix and the translation vector based on the 3D features;
if the rotation matrix and the translation vector are acquired in a 3D-to-2D manner, step S5 includes:
triangularizing the input images of the left eye and the right eye at the previous moment to obtain a 3D image;
performing feature point matching based on the 3D image and the input image of the left eye or the right eye at the later moment;
in step S6, in the step of obtaining the rotation matrix and the translation vector between the input images of the adjacent frames based on the feature point matching result of the input images of the adjacent frames, the rotation matrix and the translation vector are obtained by adopting PnP algorithm based on the feature point matching result.
9. A visual odometer device employing the visual odometer method of any of claims 1 to 8, comprising:
the visual camera is used for acquiring an external road image;
the vehicle target detection module is used for receiving the road image output by the visual camera, detecting a target vehicle and marking an anchor frame;
the image processing module is used for eliminating pixels in the anchor frame range in the road image and used as an input image processed by the visual odometer;
and the visual odometer module is used for matching the input images of the adjacent frames and outputting characteristic point matching results, and acquiring a rotation matrix and a translation vector between the input images of the adjacent frames based on the characteristic point matching results so as to perform motion estimation on the visual camera.
10. The visual odometer device of claim 9, wherein the visual camera is a monocular, binocular or depth camera.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211171786.4A CN116151320A (en) | 2022-09-26 | 2022-09-26 | Visual odometer method and device for resisting dynamic target interference |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211171786.4A CN116151320A (en) | 2022-09-26 | 2022-09-26 | Visual odometer method and device for resisting dynamic target interference |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116151320A true CN116151320A (en) | 2023-05-23 |
Family
ID=86353201
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211171786.4A Pending CN116151320A (en) | 2022-09-26 | 2022-09-26 | Visual odometer method and device for resisting dynamic target interference |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116151320A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116399350A (en) * | 2023-05-26 | 2023-07-07 | 北京理工大学 | Method for determining semi-direct method visual odometer fused with YOLOv5 |
-
2022
- 2022-09-26 CN CN202211171786.4A patent/CN116151320A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116399350A (en) * | 2023-05-26 | 2023-07-07 | 北京理工大学 | Method for determining semi-direct method visual odometer fused with YOLOv5 |
CN116399350B (en) * | 2023-05-26 | 2023-09-01 | 北京理工大学 | Method for determining semi-direct method visual odometer fused with YOLOv5 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3504682B1 (en) | Simultaneous localization and mapping with an event camera | |
Kneip et al. | Robust real-time visual odometry with a single camera and an IMU | |
Alcantarilla et al. | On combining visual SLAM and dense scene flow to increase the robustness of localization and mapping in dynamic environments | |
Clipp et al. | Robust 6dof motion estimation for non-overlapping, multi-camera systems | |
US20220051425A1 (en) | Scale-aware monocular localization and mapping | |
Honegger et al. | Embedded real-time multi-baseline stereo | |
CN106033614B (en) | A kind of mobile camera motion object detection method under strong parallax | |
Giering et al. | Multi-modal sensor registration for vehicle perception via deep neural networks | |
CN112802096A (en) | Device and method for realizing real-time positioning and mapping | |
Cattaneo et al. | Cmrnet++: Map and camera agnostic monocular visual localization in lidar maps | |
US11069071B1 (en) | System and method for egomotion estimation | |
KR20140054710A (en) | Apparatus and method for generating 3d map | |
CN111738032A (en) | Vehicle driving information determination method and device and vehicle-mounted terminal | |
Burlacu et al. | Obstacle detection in stereo sequences using multiple representations of the disparity map | |
Geiger et al. | Object flow: A descriptor for classifying traffic motion | |
Oreifej et al. | Horizon constraint for unambiguous uav navigation in planar scenes | |
CN116151320A (en) | Visual odometer method and device for resisting dynamic target interference | |
CN108090930A (en) | Barrier vision detection system and method based on binocular solid camera | |
CN103236053B (en) | A kind of MOF method of moving object detection under mobile platform | |
Sheikh et al. | Geodetic alignment of aerial video frames | |
Peng et al. | Fast 3D map reconstruction using dense visual simultaneous localization and mapping based on unmanned aerial vehicle | |
Lee et al. | Globally consistent video depth and pose estimation with efficient test-time training | |
Pagel | Robust monocular egomotion estimation based on an iekf | |
Diskin et al. | UAS exploitation by 3D reconstruction using monocular vision | |
Wang et al. | Research on omnidirectional ORB-SLAM2 for mobile robots |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |