CN116804553A

CN116804553A - Odometer system and method based on event camera/IMU/natural road sign

Info

Publication number: CN116804553A
Application number: CN202310771745.7A
Authority: CN
Inventors: 汤新华; 鲁佳豪
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2023-06-27
Filing date: 2023-06-27
Publication date: 2023-09-26

Abstract

The invention provides an event camera/IMU/natural road sign based odometer system and method, wherein the system comprises: an event camera tracking module, a traditional camera tracking module and a joint optimization module. The invention utilizes the complementary advantages of the event camera and the traditional camera in performance, adopts the scheme of combining the event camera and the traditional camera to construct the visual odometer, improves the robustness of the system in a high-speed high-dynamic scene, calculates the pose at the current moment and updates the pose by the scheme of presetting natural landmarks of known position information in the scene, and realizes the function similar to loop detection. The invention adds the step of eliminating dynamic points in the traditional camera tracking module, thereby further improving the robustness of the system in dynamic scenes.

Description

Odometer system and method based on event camera/IMU/natural road sign

Technical Field

The invention belongs to the technical field of computer software, relates to a simultaneous localization and map construction technology, and in particular relates to an event camera/IMU/natural road sign-based odometer system and method.

Background

Meanwhile, the positioning and mapping (SLAM) technology has important application in various fields such as autonomous robot control or augmented/virtual reality. Visual odometers have made tremendous progress in recent years due to the good complementary performance of cameras and inertial sensors. However, due to some well-known limitations of conventional cameras (motion blur and low dynamic range), these visual odometer methods still have some situations that are difficult to cope with, such as high-speed motion or high dynamic range scenes. The advent of event cameras has provided tremendous potential to overcome these problems, unlike conventional cameras that output pixel intensities at a fixed frame rate, event cameras (such as dynamic vision sensor DVS) output only changes in pixel intensities. Event cameras have significant advantages over traditional cameras in some respects: microsecond-level delay and very high dynamic range (typically more than 2 times that of conventional cameras). In particular, since all pixels are collecting light information independently, the event camera is not affected by motion blur. The research of using an event camera to perform state estimation has developed to a certain extent, and proposals of a few pure event cameras and a scheme of combining the event cameras and IMU elements to perform pose estimation are proposed, but most of the proposals lack loop detection modules, and long-term operation results of the system are difficult to guarantee.

Disclosure of Invention

In order to overcome the defects of the prior art, in consideration of certain complementarity of the event camera and the traditional camera in performance and excellent performance of the visual inertial odometer in robustness and precision, the invention provides a novel odometer system and a novel odometer method based on the combination of the event camera, the IMU and the natural road sign on the basis of the prior art, and the functions similar to loop detection are realized by arranging preset natural road marks with known position information, calculating and updating the pose after the recognition of the road marks by using a target detection algorithm so as to improve the performance of the system.

In order to achieve the above purpose, the present invention provides the following technical solutions:

the method for realizing the odometer based on the event camera/IMU/natural road sign comprises the following steps:

step one, event camera tracking

According to the event stream output by the event camera, extracting feature points on the original event based on SAE (active event surface, surface ofActive Events), tracking the extracted feature points by using LK optical flow method, calculating the inverse depth of the feature points to generate map points, and further tracking the map points;

step two, traditional camera tracking

Extracting characteristic points on a standard image frame, tracking, and adding a target detection model YOLOV5 to identify a preset natural landmark, and calculating the pose by using a PnP algorithm and updating the pose of the current frame when the preset natural landmark is detected;

step three, joint optimization

The fusion of visual inertia is realized through nonlinear optimization, the fusion problem of the visual inertia is converted into the minimization problem of an objective function, the objective function comprises four parts, an priori error after marginalization, a reprojection error of an event camera and a traditional camera, and an error term of inertial measurement, and the accurate pose estimation is obtained after solving by using a G2O optimization library.

Further, the event is represented using TS (Time Surface) with polarity:wherein T is _p For the intensity of an event, x, y are pixel coordinates, t is the time at which the event occurred, p represents the polarity of the change in pixel intensity, S is the timestamp of the last trigger event for that pixel, and τ represents a constant decay rate.

Further, the first step specifically includes the following sub-steps:

step 1, generating SAE according to an event stream provided by an event camera, wherein the SAE records the time stamp of the last trigger event of each pixel in the event stream, the SAE is a two-dimensional array, each element in the array corresponds to one pixel, and the value of each element corresponds to the time stamp of the last trigger event;

step 2, when a new event stream is received, extracting angular points on the generated SAE by using an Arc algorithm, maintaining two annular event sets by using the Arc algorithm, and detecting whether continuous arcs or complementary arcs thereof are in a certain range to extract the angular points;

and step 3, tracking the extracted feature points by using an LK optical flow method, triangulating to generate map points and further tracking if the tracking is successful, discarding the feature points if the tracking is failed, and extracting new feature points when the number of the feature points is lower than a set threshold value.

Further, the specific steps of extracting feature points by the Arc algorithm in the step 2 are as follows:

(1) Setting two annular event sets, wherein each set comprises N nearest events;

(2) When a new event arrives, adding the new event into two annular event sets, calculating radians between the event and other events in an annular, and recording the continuous arc or the complementary arc range thereof;

(3) Checking whether the corresponding continuous arc or complementary arc of each newly added event is in a certain range, and if so, marking the event as a corner point;

(4) The oldest event is pruned from the two ring-shaped event sets to ensure that only the most recent N events are contained in the set.

Further, in the step 3, the extraction of the key frames is carried out in the tracking process, and the extracted key frames are sent to a joint optimization module for optimization calculation; the selection of the key frame is based on the number of currently tracked features and the average disparity of the tracked features between two consecutive time stamps, respectively, setting a threshold, and selecting a new key frame when the number of tracked features is below the set threshold or the average disparity of the tracked feature points between two consecutive time stamps is greater than the set threshold.

Further, the second step specifically includes the following sub-steps:

step 1, pre-collecting image data to train a YOLOV5 model,

step 2, performing object detection on standard image frames of the traditional camera, identifying whether landmarks exist in the image,

step 3, extracting feature points from the standard image frame, identifying potential dynamic objects according to the target detection result, dividing the feature points in the potential dynamic object area into static feature points and dynamic feature points by utilizing static consistency, and eliminating the dynamic feature points;

step 4, tracking the extracted static characteristic points by using an optical flow method,

and 5, if the landmark is identified in the step 2, calculating the pose by using a PnP algorithm and updating the camera pose at the moment.

Further, the specific step of eliminating the dynamic feature points in the step 3 is as follows:

(1) Processing each frame of image by using the latest YOLOV5 algorithm, and identifying all objects in the image;

(2) The rectangular area where the potential dynamic object is located in the identification result is stored in the format of (x) ₁ ,y ₁ ,h,w)，x ₁ Is the abscissa of the upper left corner of the region, y ₁ The height of the rectangular region is h, and w is the width of the rectangular region;

(3) Dividing the feature points in the potential moving object region into static feature points and dynamic feature points through static consistency, and finally eliminating the dynamic feature points; the static feature points and the dynamic feature points are distinguished by the following steps:

for the characteristic points i, j in the image frame at the t moment and the characteristic points i, j in the image frame at the corresponding t+1 moment, useThe distance between two feature points in the image frame at the same time is defined as follows:

wherein u, v are the abscissa and the ordinate of the feature point in the pixel coordinate system; when the distances between the corresponding feature points in the image frames at different moments are the same or the distance difference is within a certain threshold range, the two feature points are static feature points, the threshold is determined by the feature points in the non-potential moving object region, and the calculation formula is as follows:

wherein S is a set of feature points in the non-potential moving object region, d _t (i, j) is the distance between any two feature points in the S set at time t, and N is the number of feature points in the S set.

Further, the specific steps of updating the pose estimation by identifying the artificial landmarks in the step 2 and the step 5 are as follows:

(1) Acquiring image data of a preset natural landmark which is arranged in a scene in advance from different angles;

(2) Labeling targets on the acquired image data, and labeling out the part of the preset natural landmark;

(3) Training the model by using the marked data set;

(4) And identifying the landmark in the image by using the trained model, obtaining a 3D coordinate of the landmark in a world coordinate system and a 2D coordinate in the image after successful identification, solving by using a PnP algorithm to obtain pose estimation, and updating the pose in real time.

Further, the objective function in the third step is:

the x on the left side of the equation is a variable to be estimated, the first term on the right side of the equation is a priori residual after marginalization, the second term is a residual of IMU pre-integration, the third term is a measurement residual of an event camera, and the fourth term is a measurement residual of a traditional camera; w (W) _(·) And a weight corresponding to each residual error.

The invention also provides a visual odometer system based on the event camera/the traditional camera/the IMU/the natural road sign, which is used for realizing the visual odometer realizing method based on the event camera/the traditional camera/the IMU/the natural road sign, and comprises an event camera tracking module, a traditional camera tracking module and a joint optimization module; wherein,,

the event camera tracking module is used for directly extracting feature points on an original event based on SAE (Surface ofActive Events) according to an event stream output by the event camera, tracking the extracted feature points by using an LK optical flow method, calculating the inverse depth of the extracted feature points, generating map points, and further tracking the map points;

the traditional camera tracking module is used for extracting characteristic points on a standard image frame and tracking, adding a target detection model YOLOV5 for identifying a preset natural landmark, and calculating the pose by using a PnP algorithm and updating the pose of the current frame when the preset natural landmark is detected;

the joint optimization module is used for realizing fusion of visual inertia through nonlinear optimization, converting the fusion problem of the visual inertia into a minimization problem of an objective function, wherein the objective function comprises four parts, an priori error after marginalization, a reprojection error of an event camera and a traditional camera, and an error term of inertial measurement, and obtaining accurate pose estimation after solving by using a G2O optimization library.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention utilizes the complementary advantages of the event camera and the traditional camera in performance, adopts the scheme of combining the event camera with the traditional camera with the IMU/natural road sign to construct the visual odometer, can solve the problem that the traditional visual inertial odometer method cannot work in a high-speed high-dynamic scene, and simultaneously improves the robustness and the accuracy compared with the scheme of combining the event camera with the IMU.

2. The landmarks with known positions arranged in the scene are identified by utilizing a target detection algorithm, the camera pose at the current moment is calculated and updated, and the function similar to loop detection is realized under the condition that loop detection does not need to occur in the track.

3. The potential moving objects in the scene are identified through the target detection algorithm, dynamic feature points are found out and removed through static consistency, mismatching in feature matching can be reduced, and robustness of the system is improved.

Drawings

FIG. 1 is a block diagram of a system according to the present invention.

Fig. 2 is a step of identifying landmark update poses.

FIG. 3 is a step of eliminating dynamic points.

Detailed Description

The technical scheme provided by the present invention will be described in detail with reference to the following specific examples, and it should be understood that the following specific examples are only for illustrating the present invention and are not intended to limit the scope of the present invention.

The embodiment of the invention provides an odometer system based on an event camera/an IMU/a natural road sign, wherein a traditional camera is carried on a mobile trolley platform, sensors such as the event camera and an IMU element are rigidly connected, and the inner parameters of the sensors and the outer parameters of the sensors are pre-calibrated. And arranging landmarks with known positions in a task scene, and simultaneously acquiring an image data set required by a training model of a target detection algorithm in advance.

The starting point of the mobile trolley is taken as an origin, the direction of the headstock is the positive direction of the x axis, the left hand of the headstock is the positive direction of the y axis, and the world coordinate system is established right above the vehicle body in the positive direction of the z axis. During operation of the mobile cart, the event stream provided by the event camera, the image stream provided by the conventional camera, and the acceleration and angular velocity measurements provided by the IMU element are acquired in real time.

The system comprises an event camera tracking module, a traditional camera tracking module and a joint optimization module. The event camera tracking module is used for directly extracting feature points on an original event based on SAE (Surface ofActive Events) according to an event stream output by the event camera, tracking the extracted feature points by using an LK optical flow method, calculating the inverse depth of the extracted feature points, generating map points, and further tracking the map points. The traditional camera tracking module is used for extracting characteristic points on a standard image frame and tracking the characteristic points, adding a target detection model YOLOV5 for identifying preset natural landmarks, and calculating the pose by using a PnP algorithm and updating the pose of the current frame when the preset natural landmarks are detected. The joint optimization module is used for realizing fusion of visual inertia through nonlinear optimization, converting the fusion problem of the visual inertia into a minimization problem of an objective function, wherein the objective function comprises four parts, an priori error after marginalization, a reprojection error of an event camera and a traditional camera, and an error term of inertial measurement, and obtaining accurate pose estimation after solving by using a G2O optimization library.

The present invention defines the representation of events using TS (Time Surface) with polarity to represent:wherein T is _p For the intensity of an event, x, y are pixel coordinates, t is the time at which the event occurred, p represents the polarity of the change in pixel intensity, S is the timestamp of the last trigger event for that pixel, and τ represents a constant decay rate.

The following processing is carried out by adopting an event camera tracking module:

step 1, generating SAE according to the event stream provided by the event camera, wherein SAE records the time stamp of the last trigger event of each pixel in the event stream, and the SAE is a two-dimensional array, each element in the array corresponds to one pixel, and the value of each element corresponds to the time stamp of the last trigger event.

And 2, when a new event stream is received, extracting corner points on the generated SAE by using an Arc algorithm, wherein the Arc algorithm maintains two annular event sets, detects whether continuous arcs or complementary arcs thereof are in a certain range to extract the corner points, and the Arc algorithm comprises the following specific steps of:

And extracting key frames in the tracking process, and sending the extracted key frames to a joint optimization module for optimization calculation. The selection of the key frame is based on the number of currently tracked features and the average disparity of the tracked features between two consecutive time stamps, respectively, setting a threshold, and selecting a new key frame when the number of tracked features is below the set threshold or the average disparity of the tracked feature points between two consecutive time stamps is greater than the set threshold.

Image data acquired by a traditional camera is processed, and preprocessing such as de-distortion correction is performed first. Processing the picture by adopting a traditional camera tracking module:

performing landmark recognition on the preprocessed picture, and realizing the landmark recognition by means of a YOLOV5 target detection algorithm (the YOLOV5 model needs to be trained in advance), wherein the method comprises the following specific steps of:

(1) Controlling a trolley carrying a traditional camera to acquire an image data set in a scene with a landmark;

(2) Extracting part of images from the acquired data set to carry out target labeling, and labeling the landmarks;

(3) Dividing the marked image into a training set and a verification set, and training on the basis of a pre-training network;

(4) After model training is completed, the model can be used for landmark target detection.

Likewise extracting FAST corner points on standard image frames, wherein the extracted feature points possibly have errors in the follow-up tracking process due to the feature points of dynamic objects, and the dynamic feature points are required to be removed, and the specific steps of removing the dynamic points are as follows:

(2) The rectangular area where the potential dynamic object is located in the identification result is stored in the format of (x) ₁ ,y ₁ ,h,w)，x ₁ Horizontal sitting in the upper left corner of the regionMark, y ₁ The vertical coordinate of the upper left corner of the region is h, the height of the rectangular region is h, and w is the width of the rectangular region;

(3) And dividing the feature points in the potential moving object region into static feature points and dynamic feature points through static consistency, and finally eliminating the dynamic feature points. For the characteristic points i, j in the image frame at the t moment and the characteristic points i, j in the image frame at the corresponding t+1 moment, useRepresenting, defining the distance between two feature points in the image frame at the same time asWhere u, v are the abscissa and ordinate of the feature point in the pixel coordinate system. According to the consistency of the static object, if the two feature points are static feature points, the distances between the corresponding feature points in the image frames at different moments are the same. Considering the existence of errors, the difference value of the distances between the corresponding characteristic points in the image frames at different moments is within a certain threshold range. The threshold value is determined by the characteristic points in the non-potential moving object area, and the calculation formula is +.>S is a set of feature points in a non-potential moving object region, d _t (i, j) is the distance between any two feature points in the S set at time t, and N is the number of feature points in the S set.

And tracking the obtained static feature points by using an LK optical flow method, selecting a key frame in the tracking process, setting a threshold value respectively based on the number of the tracked feature points and the distance from the last key frame, and selecting a new key frame when the number of the tracked key frame is lower than the set threshold value or the distance from the last key frame is greater than the set threshold value.

If a landmark is identified in the step of identifying the landmark using the object detection algorithm, the landmark is found in the image after the landmark identification is completed, since the 3D position of the landmark itself is knownA group of corresponding 2D-3D point pairs can be obtained, the pose estimation of the current moment relative to the landmark can be obtained by solving through a PNP algorithm, and then the pose estimation is converted into the pose estimation relative to the world coordinate system: t (T) _WC ＝T _WL ·T _LC And updating the pose of the current moment image by using the obtained calculation result.

And constructing a least square problem by utilizing information provided by the event camera tracking module and the traditional camera tracking module respectively, minimizing an objective function, solving the problem by using a G2O optimization library, and outputting pose estimation after joint optimization. The whole optimization process is based on key frames, and simultaneously, a sliding window strategy is used, only the sliding comprising the last K key frames is optimized, and IMU measurement is used for propagation of prediction of the sensor state between frames.

The objective function is:

the χ on the left side of the equation is a variable to be estimated, the first term on the right side of the equation is a priori residual after marginalization, the second term is a residual of IMU pre-integration, the third term is a measurement residual of an event camera, and the fourth term is a measurement residual of a traditional camera. W (W) _(·) And a weight corresponding to each residual error.

The invention uses the scheme of combining an event camera/a traditional camera/an IMU/a natural road sign, can solve the problem that the traditional visual inertial odometer cannot work in a high-speed high-dynamic scene, and simultaneously realizes the function similar to loop detection in a system by using a method of pre-distributing the landmark with known position information in the scene. In addition, a step of eliminating dynamic points in the environment is added in the traditional camera tracking module, so that the robustness of the system is improved.

The technical means disclosed by the scheme of the invention is not limited to the technical means disclosed by the embodiment, and also comprises the technical scheme formed by any combination of the technical features. It should be noted that modifications and adaptations to the invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

Claims

1. The method for realizing the odometer based on the event camera/IMU/natural road sign is characterized by comprising the following steps:

step one, event camera tracking

According to the event stream output by the event camera, extracting feature points on the original event directly based on SAE (Surface ofActive Events), tracking the extracted feature points by using an LK optical flow method, calculating the inverse depth of the extracted feature points to generate map points, and further tracking the map points;

step two, traditional camera tracking

step three, joint optimization

2. The event camera/IMU/natural road sign based odometer implementation of claim 1, wherein the event is represented using TS with polarity:wherein T is _p For the intensity of an event, x, y are pixel coordinates, t is the time at which the event occurred, p represents the polarity of the change in pixel intensity, S is the timestamp of the last trigger event for that pixel, and τ represents a constant decay rate.

3. The method for implementing the odometer based on the event camera/IMU/natural road sign according to claim 1, wherein said step one comprises the following sub-steps:

4. The method for implementing the odometer based on the event camera/IMU/natural road sign according to claim 3, wherein the specific steps of extracting feature points by Arc algorithm in step 2 are as follows:

5. The method for realizing the odometer based on the event camera/IMU/natural road sign according to claim 3, wherein the key frames are extracted in the tracking process in the step 3, and the extracted key frames are sent to a joint optimization module for optimization calculation; the selection of the key frame is based on the number of currently tracked features and the average disparity of the tracked features between two consecutive time stamps, respectively, setting a threshold, and selecting a new key frame when the number of tracked features is below the set threshold or the average disparity of the tracked feature points between two consecutive time stamps is greater than the set threshold.

6. The method for implementing the odometer based on the event camera/IMU/natural road sign according to claim 1, wherein the second step comprises the following sub-steps:

step 1, pre-collecting image data to train a YOLOV5 model,

7. The method for implementing the odometer based on the event camera/IMU/natural road sign according to claim 6, wherein the specific steps of eliminating the dynamic feature points in the step 3 are as follows:

(2) Storing the rectangular area where the potential dynamic object is located in the identification result, and storingThe format is (x ₁ ,y ₁ ,h,w)，x ₁ Is the abscissa of the upper left corner of the region, y ₁ The height of the rectangular region is h, and w is the width of the rectangular region;

8. The method for realizing the odometer based on the event camera/IMU/natural landmark as set forth in claim 6, wherein the specific steps of updating the pose estimation by identifying the artificial landmark in the step 2 and the step 5 are as follows:

(3) Training the model by using the marked data set;

9. The method for implementing the odometer based on the event camera/IMU/natural road sign according to claim 1, wherein the objective function in the third step is:

10. An event camera/IMU/natural landmark based odometer system, characterized in that it is configured to implement the event camera/conventional camera/IMU/natural landmark based visual odometer implementation method according to any one of claims 1-9, comprising an event camera tracking module, a conventional camera tracking module, and a joint optimization module; wherein,,

the event camera tracking module is used for directly extracting feature points on an original event based on SAE (Surface of Active Events) according to an event stream output by the event camera, tracking the extracted feature points by using an LK optical flow method, calculating the inverse depth of the extracted feature points, generating map points, and further tracking the map points;