CN110390685B

CN110390685B - Feature point tracking method based on event camera

Info

Publication number: CN110390685B
Application number: CN201910672162.2A
Authority: CN
Inventors: 史殿习; 李凯月; 李睿豪; 伽晗; 王明坤
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2019-07-24
Filing date: 2019-07-24
Publication date: 2021-03-09
Anticipated expiration: 2039-07-24
Also published as: CN110390685A

Abstract

The invention discloses a characteristic point tracking method based on an event camera, aiming at improving the tracking precision of characteristic points. The technical scheme is that a characteristic point tracking system based on an event camera, which consists of a data acquisition module, an initialization module, an event set selection module, a matching module, a characteristic point updating module and a template edge updating module, is constructed. An initialization module extracts feature points and edge maps from image frames; an event set selection module selects an event set S of the feature points from the event stream around the feature points; the matching module matches the S with the template edges around the feature points to calculate t_kOptical flow set G of n feature points at time_kThe feature point update module is according to G_kCalculating t_k+1Position set FD of n feature points at time_k+1The template edge updating module uses the IMU data pair PBD_kIs updated to obtain t_k+1Position set PBD of template edge corresponding to n characteristic points at moment_k+1. The method and the device can improve the accuracy of tracking the characteristic points on the event stream and prolong the average tracking time of the characteristic points.

Description

Feature point tracking method based on event camera

Technical Field

The invention relates to the field of computer image processing, in particular to a method for completing tracking of feature points in an image by using an event camera.

Background

SLAM (full name simultaneous localization and mapping) has been widely studied in recent years as an important branch of the field of robotics. SLAM attempts to solve such problems: a robot moves in an unknown environment, how to determine the motion track of the robot through observation of the environment and how to construct a map of the environment. SLAM technology is just a summation of many technologies involved to achieve this goal. A complete SLAM system mainly comprises a front-end visual odometer part and a back-end optimization part. The vision odometer part estimates the state of the robot, and the estimation method mainly comprises two methods: the characteristic point method and the direct method. The feature point method is a mainstream method for estimating the state of the robot at present, namely, feature points are extracted from an image, the feature points of different frames are matched, and then the matched feature point pairs are subjected to related operation to estimate the pose of a camera. Commonly used point features include Harris corner points, SIFT, SURF, ORB, HOG features. Different from a characteristic point method, the direct method can omit the process of extracting characteristic points and directly utilize the gray information in the image to estimate the state of the robot, but the method is not mature and has poor robustness.

However, whether feature point or direct, the use of standard cameras still presents problems of accuracy and robustness in dealing with extreme environments. The extremes include two main categories: under the condition of high-speed motion of the camera, when a standard camera is adopted to obtain an image, if the camera moves too fast, the obtained image has a motion blur phenomenon; under the condition of a high dynamic range scene, the light intensity change in the scene is strong, and the light and shade change of the front frame and the back frame is obvious. In these extreme cases, the use of standard cameras can severely impact the algorithmic effectiveness of the direct and feature point methods. Furthermore, standard cameras cannot provide a more accurate motion profile of feature points from frame to frame. And a standard camera generates redundant information in a static scene, which not only results in a waste of storage resources, but also consumes a lot of extra computing resources in the subsequent image processing process.

The advent of a bio-inspired event camera overcomes the above limitations of standard cameras. An event camera is typically represented by DVS (Dynamic Vision Sensor, known in chinese as Dynamic Vision Sensor, published 2011 by Patrick Lichtsteiner et al in the Journal of Solid-State Circuits, volume 43, phase 2, page 566, page 576, article "a 128 × 128120 dB 15 μ s Latency Asynchronous time Contrast Vision Sensor", i.e. "one 128 × 128 pixel, 120dB Dynamic range, 15 microsecond delay, produced by vivavation AG of switzerland, unlike standard cameras, the event camera outputs only brightness changes at the pixel level, in a pixel array, whenever a change in intensity exceeds a threshold, the corresponding pixel independently generates an output, referred to as" event ", so, unlike standard cameras, the data output by an event camera is a spatiotemporal event stream due to low Latency, large fast moving cameras, additionally have a great advantage in scene motion, the event camera may also avoid recording redundant information in slowly changing scenes. In 2014, a new event camera, DAVIS (Dynamic and Active-pixel Vision Sensor, known in Chinese as Dynamic and Active pixel Sensor), was proposed by an article "A240 × 180130dB 3 μ s Latency Global Shutter spatial and temporal Vision Sensor", published by Christian Brandli et al in the Journal of Solid-State Circuits 49, 10, 2333. sup. 2344, namely "a 240 × 180 pixels, 130dB Dynamic range, 3 μ s delayed Global Shutter spatio-temporal Vision Sensor", and was later produced by the iVation AG of Switzerland. DAVIS combines a standard camera and an event camera DVS together and can output both image frames and an event stream.

For feature point tracking by using an Event camera, Zhu et al propose a method for directly extracting feature points from an Event stream and calculating a feature point optical flow by using the Event stream to realize feature point tracking in an article "Event-based feature tracking with a predictive data association" which is a method for realizing Event-based feature point tracking by using probability data association (published in Conference International Conference on Robotics and Automation (ICRA) in 2017, page 4465 and 4470). Since the method calculates the feature point optical flow using only the event stream, the tracking accuracy is not high. Kueng et al, in the article Low-Latency Visual evaluation Event-based features Tracks, namely, Low-Latency Visual odometer using Event-based Feature point tracking (2016, published in Conference 2016 International Conference on Intelligent Robots and Systems (IROS), and having an Inspec search number of 16504091), proposed a Feature point tracking method that combines image frames and Event streams simultaneously, and uses information in the image frames to perform geometric registration with the Event streams, thereby tracking the Feature points. With the method, the position of the feature point is updated once an event is received, which increases the calculation amount, and the method introduces more errors, which leads to poor tracking accuracy of the feature point and can not realize long-time tracking.

Therefore, the existing method for tracking the feature points by using the event camera still has the defects of low tracking accuracy and short tracking time.

Disclosure of Invention

The invention aims to provide a feature point tracking method based on an event camera, wherein the DAVIS is used as the event camera, so that the accuracy of tracking feature points on an event stream is improved, and the average tracking time of the feature points is prolonged.

In order to solve the problem, the invention provides a feature point tracking method based on an event camera, the used event camera is DAVIS, the method simultaneously uses edges in an image frame to be matched with events in an event stream to obtain the optical flow of feature points, and the calculated optical flow is more accurate due to the simultaneous utilization of information in two forms, so that the feature point tracking precision is improved. In addition, the invention introduces IMU (Inertial Measurement Unit, named as Inertial Measurement Unit in Chinese) to update the position of the edge, so that the position of the edge in the tracking process is more accurate, and the average tracking time of the feature point is prolonged.

The specific technical scheme is as follows:

firstly, a characteristic point tracking system based on an event camera is constructed. The characteristic point tracking system of the event camera consists of a data acquisition module, an initialization module, an event set selection module, a matching module, a characteristic point updating module and a template edge updating module.

The data acquisition module is connected with the initialization module, the event set selection module and the template edge updating module. The data set acquisition module downloads data from an open Event Camera data set 'The Event-Camera Dataset and Simulator' (named as 'Event Camera data set and Simulator' in Chinese, The data set is acquired by DAVIS, and The data set comprises an image frame, an Event stream and IMU data), sends The acquired image frame to The initialization module, sends The Event stream to The Event set selection module, and sends The IMU data to The template edge updating module.

The initialization module is connected with the data acquisition module, the event set selection module and the matching module. The initialization module receives the image frame from the data acquisition module, extracts the feature points and the edge map from the image frame, obtains the positions of the feature points and the positions of the template edges around the feature points (the edges around the feature points in the edge map are called as the template edges corresponding to the feature points), sends the positions of the feature points to the event set selection module, and sends the positions of the template edges around the feature points to the matching module.

The event set selection module is connected with the data acquisition module, the initialization module, the feature point updating module and the matching module, receives an event stream from the data acquisition module, receives the positions of feature points from the initialization module (the first cycle) or the feature point updating module (from the second cycle), receives the optical flow of the feature points from the feature point updating module (from the second cycle), selects an event set of the feature points from the event stream around each feature point, and sends the position of each feature point and the corresponding event set to the matching module.

The matching module is connected with the initialization module, the event set selection module, the feature point updating module and the template edge updating module, receives the positions of the feature points and the event sets corresponding to the feature points from the event set selection module, receives the positions of template edges around the feature points from the initialization module (the first circulation) or the template edge updating module (from the second circulation), matches the event sets of the feature points with the template edges around the feature points, calculates the optical flow of each feature point, and sends the positions of the feature points, the positions of the template edges around the feature points and the optical flow of the feature points to the feature point updating module.

The feature point updating module is connected with the matching module, the template edge updating module and the event set selection module, receives the positions of the feature points, the positions of template edges around the feature points and the optical flows of the feature points from the matching module, calculates the new positions of the feature points by using the optical flows of the feature points, sends the positions of the template edges around the feature points and the new positions of the feature points to the template edge updating module, sends the optical flows of the feature points and the new positions of the feature points to the event set selection module, and outputs the new positions of the feature points (namely the tracking results of the feature points) for a user to check.

The template edge updating module is connected with the data acquisition module, the feature point updating module and the matching module, receives the positions of the template edges around the feature points and the new positions of the feature points from the feature point updating module, receives IMU data from the data acquisition module, updates the positions of the template edges around the feature points through the IMU data, and sends the updated positions of the template edges to the matching module.

In a second step, The data acquisition module acquires image frames, Event streams and IMU data from The Event-Camera Dataset "The Event-Camera Dataset and Simulator".

Thirdly, using a characteristic point tracking system based on an event camera to track the slave t₀Time begins to t_NAnd tracking the characteristic points in the event stream obtained by the data acquisition module within the time period when the time is over. Time interval t in tracking process₀,t_N]Divided into a series of sub-time intervals t₀,t₁],...,[t_k,t_k+1],...,[t_N-1,t_N]The k +1 th time subinterval is denoted as [ t ]_k,t_k+1]N represents the number of sub-time intervals, the size of N is determined by the time length of the event stream, and the value range is N not less than 1. The tracking process is as follows:

3.1, making the time sequence number k equal to 0;

3.2, the initialization module performs initialization operation, and the specific steps are as follows:

3.2.1 initialization Module uses Harris corner detection method (in 1988, proposed by the article "A combined corner and edge detector" published by C.G. Harris et al in Conference alternate Vision Conference, volume 15, phase 50, namely "Combined corner and edge detector") to extract feature points from the image frames obtained by the data acquisition module, and puts the extracted feature points into a set FD, where FD is set to { f }₁,...,f_i,...,f_n}，f_iRepresenting the ith detected characteristic point, and n is the number of the characteristic points. Considering the position of the feature points as a function of time, let t₀The positions of n characteristic points in the time FD are placed in the characteristic point position set FD₀In (FD)₀＝{f₁(t₀),...,f_i(t₀),...,f_n(t₀)}，f_i(t₀) Representative feature point f_iAt t₀Position of time, FD₀And sending the event to an event set selection module.

3.2.2 the initialization module extracts edge maps from the image frames obtained by the data acquisition module using the Canny edge detection method (in 1987, proposed by article "A computational application to edge detection" published by John Canny et al at conference reading in Computer Vision, page 184-203), where each image frame corresponds to an edge map.

3.2.3 the initialization Module selects the template edge (represented by the set of all template points on the template edge) corresponding to n feature points in FD, and corresponds the n feature points in FD to t₀Position set PBD of time-carving template edge₀And sending the data to a matching module. The method comprises the following steps:

3.2.3.1 making i ═ 1;

3.2.3.2 characterizing Point f_iAt t₀Position of time f_i(t₀) At the center, at f_i(t₀) Selecting a rectangular area around the periphery

The size is s × s, i.e.

Is s, which ranges from 20 to 30 pixels. The edge graph detected at 3.2.2 is shown in

The inner side is taken as f_iThe template edge of (1). The pixel point on the template edge is f_iThe template points of (1). Definition of

Set of template points within

I.e. f_iCorresponding template edge, will

Put into template edge set PB:

p_jis f_iThe jth template point of (a). P is to be_jAs a function of time, p_j(t₀) Represents p_jAt t₀The location of the time of day.

Is shown at t₀Time p_jIn a rectangular area

And (4) the following steps. m is_iRepresentation collection

The number of middle template points.

3.2.3.3 let i ═ i + 1;

3.2.3.4 if i is less than or equal to n, turning to 3.2.3.2; whether or notThen, a template edge set PB of template edges corresponding to the n feature points in FD is obtained,

definition of

The middle template point is at t₀Set of locations of time of day

Let the template point in the n template edges in PB be at t₀Location collection of time of day

PBD (poly-p-phenylene terephthalamide)₀And sending the data to a matching module, and turning to the fourth step.

Fourthly, the event set selection module receives the event stream from the data acquisition module, respectively receives different data from the initialization module or the characteristic point updating module according to different k values, selects an event set S of characteristic points from the event stream around the n characteristic points, and enables the FD to be displayed on the display screen_kAnd sending the event set S of the feature points to a matching module, wherein the method comprises the following steps:

4.1 if k is equal to 0, the event set selection module receives the event stream from the data acquisition module and receives t from the initialization module_kPosition set FD of n feature points at time_k，FD_k＝{f₁(t_k),...,f_i(t_k),...,f_n(t_k) At this time, FD₀＝{f₁(t₀),...,f_i(t₀),...,f_n(t₀) }), let t₁＝t₀+1, unit is second, change 4.3;

4.2 if k is more than or equal to 1, the event set selection module receives the event stream from the data acquisition module and receives t from the feature point updating module_kPosition set FD of n feature points at time_kAnd t_k-1Optical flow set G of n feature points at time_k-1，

Wherein

Represents t_k-1Time characteristic point f_iOf the estimated sub-time interval [ t ]_k,t_k+1]Is calculated according to the formula (2)_k+1：

Wherein

Is expressed in a sub-time interval t_k-1,t_k]Upper characteristic point f_iThe optical flow of (2) is obtained from the feature point update module.

Is expressed in a sub-time interval t_k-1,t_k]Average optical flow of all the above feature points. Calculating t by equation (2)_k+1The physical significance of (1) is that the time required for the feature point in the last time interval to move by 3 pixels on average is taken as the current interval [ t [ [ t ]_k,t_k+1]An estimate of the size is predicted. 4.3, rotating;

4.3 at t_kPosition set FD of n feature points at time_kAnd selecting an event set corresponding to each characteristic point around each position.

4.3.1 making i ═ 1;

4.3.2 selecting the events meeting the requirement of the formula (3) from the event stream, and putting the events to the characteristic point f_iAt t_kEvent set corresponding to time

In and (2) mixing

Put into event set S:

representing a set of events in a three-dimensional space region, the spatial extent being the characteristic point f_iAt t_kPosition of time f_i(t_k) Surrounding rectangular region H_fiThe time range is the interval [ t_k,t_k+1]。e_dRepresents

The d-th event in (e)_dFrom the event stream and within the three-dimensional temporal region specified in equation (3), d ═ 1,2_i}，z_iTo represent

The number of internal events. Event e_dIs expressed in the form of

x_dRepresents an event e_dThe coordinates in the pixel coordinate system are,

represents an event e_dThe time of occurrence of the event(s),

denotes e_dHas pixel coordinates in

And (4) the following steps.

4.3.3 let i ═ i + 1;

4.3.4 if i is less than or equal to n, turning to 4.3.2; otherwise, it indicates that t is obtained_kEvent sets S corresponding to the n feature points in time FD,

and FD will be_kAnd S is sent to a matching module, and the fifth step is carried out.

Fifthly, the matching module receives FD from the event set selection module_kAnd S, if k is 0, receiving a position set PBD of the template edge corresponding to the n characteristic points from the initialization module_kTurning to the sixth step; if k is more than or equal to 1, receiving a template edge position set PBD corresponding to the n characteristic points from the template edge updating module_kTurning to the sixth step;

sixthly, the matching module matches the S with the template edges around the feature points to calculate t_kOptical flow set G of n feature points at time_kThe positions of n feature points are collected to form FD_kAnd position sets PBD of template edges corresponding to n characteristic points_kAnd optical flow set G_kAnd sending the data to a feature point updating module. The specific method comprises the following steps:

6.1 making i ═ 1;

6.2 pairs of feature points f_iConstructing a matching error function:

for the

Event e in_dCorrecting the event e_dAt t_kThe position of the moment is given by the formula:

x′_drepresenting a calculated event e_dAt t_kLocation of time of day, abbreviated event e_dThe position of the motion correction is made,

is expressed in the time interval t_k,t_k+1]Upper characteristic point f_iThe optical flow of (2). The symbol-represents a dot product, and the places where the symbol appears in the following all represent dot products, and in the case of no ambiguity, some of the formulas in the following omit the symbol. For convenience of presentation, symbols are defined

The match error function is constructed as follows:

ε denotes the error, p_j(t_k) Represents t_kTime template point p_jThe position of (a). r is_djDenotes e_dFrom template point p_jProbability of generation, i.e. e_dAnd p_jThe probability of a match. Here will r_djAs weights, r_djAs an event e_dAnd template point p_jProbability of match, i.e. r_djThe larger, the event e_dAnd template point p_jAt t between_kThe larger the difference in distance at a time is in proportion to the total error. The double vertical lines in the formula represent the modulus operation of the vectors in the double vertical lines, and the same is applied below. r is_djThe calculation formula of (a) is as follows:

the physical significance of this formula is that the numerator is event e_dMotion corrected position and template point p_jDistance squared, denominator event e_dPosition and feature point f of motion correction_iCorresponding m_iThe sum of the squares of the distances of the template points is divided by the sum of the squares of the distances of the template points as an event e_dAnd template point p_jThe probability of a match.

6.3 the EM-ICP algorithm (matching algorithm, 2002, Multi-scale EM-ICP: A fast and robust approach for surface registration, multiscale EM-ICP: a fast and robust surface registration method) was used, which was published by Sebastient Granger et al at Conference European Conference on Computer Vision, page 418-. Minimizing the matching error function obtained by 6.2-step construction, and solving to obtain the optimal optical flow

The solving process is as follows:

6.3.1 initialization

6.3.2 calculating r by equation (6)_dj；

6.3.3 updating the optical flow:

6.3.4 calculating the amount of change in optical flow

And order

Will be provided with

Is put to t_kTemporal optical flow set G_kIn (1).

6.3.5 if

Shows that the final optimization result is obtained

6.4, rotating; if it is

Turn 6.3.2. σ is the threshold of the variation of the optical flow, and the value of the threshold ranges from 0.01 to 0.1, and the unit is pixel per second (pixel/s).

6.4 let i ═ i + 1;

if i is less than or equal to n, 6.5, turning to 6.2; otherwise, it indicates that t is obtained_kThe optical flows of n feature points at the moment of time, namely t is obtained_kTemporal optical flow set G_k，

FD will be_k、PBD_kAnd G_kSending the data to a feature point updating module, and turning to the seventh stepAnd (5) carrying out the steps.

Seventhly, the feature point updating module receives t from the matching module_kPosition set FD of n feature points at time_k、t_kPosition set PBD of template edge corresponding to n characteristic points at moment_kAnd an optical flow set G of n feature points_kCalculating t by optical flow_k+1Position set FD of n feature points at time_k+1G is_kAnd FD_k+1Sending the data to an event set selection module to select the FD_k+1And PBD_kAnd sending the data to a template edge updating module. The method comprises the following steps:

7.1 making i ═ 1;

7.2 calculating t_k+1Time f_iPosition f of_i(t_k+1) A 1 is to f_i(t_k+1) Put into the set FD_k+1In (1),

by optical flow multiplied by time, from t_kTime t_k+1Time characteristic point f_iAnd (4) moving distance.

7.3 let i ═ i + 1;

7.4 if i is less than or equal to n, rotating to 7.2; otherwise, it indicates that t is obtained_k+1The positions of n feature points at the moment are obtained, and a set FD is obtained_k+1，FD_k+1＝{f₁(t_k+1),...,f_i(t_k+1),...,f_n(t_k+1)}. G is to be_kAnd FD_k+1Sending the data to an event set selection module to select the FD_k+1And PBD_kSending the data to a template edge updating module; and will t_k+1Position set FD of n feature points at time_k+1And displaying or storing the data in a result file for a user to check, and turning to the eighth step.

Eighthly, the template edge updating module receives IMU data from the data acquisition module and t from the feature point updating module_k+1Position set FD of n feature points at time_k+1And t_kPosition set PBD of template edges around time characteristic point_kUsing IMU data pairs PBD_kIs updated to obtain t_k+1Position set PBD of template edge corresponding to n characteristic points at moment_k+1PBD of_k+1And sending the data to a matching module. The specific method comprises the following steps:

8.1 making i ═ 1;

8.2 updating feature points f_iCorresponding template edge at t_k+1The method for the time position comprises the following steps:

8.2.1 let j equal 1;

8.2.2 defining the symbol F as being AND_iCorresponding point in three-dimensional space, P_jIs a point p corresponding to the template_jPoints in the corresponding three-dimensional space, F and P_jAre all represented in the form of three-dimensional coordinates (x, y, z). Template point p_jExpressed in homogeneous coordinate form, with the format of (x, y,1), the first two dimensions respectively represent p_jAbscissa and ordinate in the pixel coordinate system, the following equation then holds:

at t_kAt the moment of time, the time of day,

at t_k+1At the moment of time, the time of day,

and K is an internal parameter matrix of the event camera and is a self-contained parameter of the event camera. R is from t_kTime t_k+1The rotation matrix of the time-of-day event camera, t being from t_kTime t_k+1And the translation vector of the moment event camera is calculated through the obtained IMU data.

Is P_jAt t_kThe depth of the time camera coordinate system,

Is P_jAt t_k+1The depth in the camera coordinate system at the moment,

is F at t_kThe depth of the time camera coordinate system,

Is F at t_k+1The depth in the camera coordinate system at the moment.

8.2.3 subtracting the two equations in equation (10) to get t_k+1Time template point p_jRelative to f_iRelative position of

The calculation formula is as follows:

due to template point p in the pixel coordinate system_jAt characteristic point f_iSo that the corresponding point P in space_jSimilarly to F, in this field, such a case is considered to be P_jAnd F is at t_k+1Having the same depth in the camera coordinate system at the moment, i.e. at

Substituting equation (9) into equation (11), and simplifying equation (11) yields:

consider p_jAnd f_iAll are represented by homogeneous coordinates, and t is obtained by further simplifying the formula (12)_k+1The formula of the relative position of the time template point is as follows:

the symbol Nor () represents a homogeneous operation, i.e., the coordinates in parentheses are converted into homogeneous coordinates. Updated t is obtained by equation (13)_k+1Time template point p_jRelative to f_iRelative position of

Obtaining t according to equation (14)_k+1Time template point p_jPosition p of_j(t_k+1) A 1 is to p_j(t_k+1) Is put to t_k+1Time characteristic point f_iSet of positions of surrounding formwork edges

8.2.4 let j ═ j + 1;

8.2.5 if j is less than or equal to m_iTurning to 8.2.2; otherwise it indicates that

Will be provided with

Is put to t_k+1Position set PBD of template edges around n characteristic points at moment_k+1Middle, 8.3;

8.3 let i ═ i + 1;

8.4 if i is less than or equal to n, rotating to 8.2; otherwise, the PBD is obtained_k+1，

PBD (poly-p-phenylene terephthalamide)_k+1And sending the data to a matching module, and turning to the ninth step.

And a ninth step of making k equal to k + 1.

Step ten, if k is less than N, turning to the step four; otherwise, ending.

The invention can achieve the following technical effects:

1. the invention simultaneously utilizes two data information of image frames and event streams output by DAVIS, firstly uses a Harris corner detection method to extract characteristic points and a Canny edge detection method to extract an edge graph, then selects a space time window from the event streams, uses an EM-ICP algorithm to match events in the window with template points on the edge graph, estimates the optical flow of the characteristic points, and further tracks the characteristic points through the optical flow.

2. When the template is updated, the IMU is used for updating the position of the template point, and the tracking precision of the template edge is improved in the tracking process, so that the accuracy of calculating the optical flow is improved, and the tracking precision of the feature point is further improved.

The invention carries out experimental verification on an Event Camera data set 'The Event-Camera Dataset and Simulator' (Event Camera data set and Simulator) released by The university of zurich, and carries out comparative experiments with The feature point tracking method proposed by Zhu et al and The method proposed by Kueng et al.

Drawings

FIG. 1 is a general flow diagram of the present invention;

FIG. 2 is a logic structure diagram of a feature point tracking system based on an event camera according to a first step of the present invention;

FIG. 3 is a comparison experiment result of average tracking accuracy errors between the present invention and the existing tracking method;

FIG. 4 is a comparison experiment result of the average tracking time of the present invention and the prior tracking method.

Detailed Description

FIG. 1 is a general flow diagram of the present invention; as shown in fig. 1, the present invention comprises the steps of:

firstly, a characteristic point tracking system based on an event camera is constructed. The feature point tracking system of the event camera is shown in fig. 2 and comprises a data acquisition module, an initialization module, an event set selection module, a matching module, a feature point updating module and a template edge updating module.

The data acquisition module is connected with the initialization module, the event set selection module and the template edge updating module. The data set acquisition module downloads data from a public Event Camera data set 'The Event-Camera Dataset and Simulator' of university of zurich, sends The acquired image frames to The initialization module, sends Event streams to The Event set selection module, and sends IMU data to The template edge updating module.

The initialization module is connected with the data acquisition module, the event set selection module and the matching module. The initialization module receives the image frame from the data acquisition module, extracts the feature points and the edge map from the image frame to obtain the positions of the feature points and the positions of the template edges around the feature points, sends the positions of the feature points to the event set selection module, and sends the positions of the template edges around the feature points to the matching module.

3.1, making the time sequence number k equal to 0;

3.2.1 the initialization module extracts feature points from the image frames obtained by the data acquisition module by using a Harris corner detection method, and places the extracted feature points in a set FD, so that the FD is set to be { f }₁,...,f_i,...,f_n}，f_iRepresenting the ith detected characteristic point, and n is the number of the characteristic points. Considering the position of the feature points as a function of time, let t₀The positions of n characteristic points in the time FD are placed in the characteristic point position set FD₀In (FD)₀＝{f₁(t₀),...,f_i(t₀),...,f_n(t₀)}，f_i(t₀) Representative feature point f_iAt t₀Position of time, FD₀And sending the event to an event set selection module.

3.2.2 the initialization module extracts edge maps from the image frames obtained by the data acquisition module by using a Canny edge detection method, wherein each image frame corresponds to one edge map.

3.2.3.1 making i ═ 1;

The size is s × s, i.e.

Set of template points within

I.e. f_iCorresponding template edge, will

Put into template edge set PB:

Is shown at t₀Time p_jIn a rectangular area

And (4) the following steps. m is_iRepresentation collection

The number of middle template points.

3.2.3.3 let i ═ i + 1;

3.2.3.4 if i is less than or equal to n, turning to 3.2.3.2; otherwise, the template edge set PB formed by the template edges corresponding to the n feature points in the FD is obtained,

definition of

The middle template point is at t₀Set of locations of time of day

4.1 if k is equal to 0, the event set selection module receives the event stream from the data acquisition module and receives t from the initialization module_kPosition set FD of n feature points at time₀Let t₁＝t₀+1, unit is second, change 4.3;

Wherein

Wherein

4.3.1 making i ═ 1;

In and (2) mixing

Put into event set S:

representing a set of events in a three-dimensional space region, the spatial extent being the characteristic point f_iAt t_kPosition of time f_i(t_k) A surrounding rectangular area

The time range is the interval [ t_k,t_k+1]。e_dRepresents

The d-th event of (1, 2., z)_i}，z_iTo represent

The number of internal events. Event e_dIs expressed in the form of

x_dRepresents an event e_dThe coordinates in the pixel coordinate system are,

represents an event e_dThe time of occurrence of the event(s),

denotes e_dHas pixel coordinates in

And (4) the following steps.

4.3.3 let i ═ i + 1;

sixthly, the matching module matches the S with the template edges around the feature points to calculate t_kOptical flow set G of n feature points at time_kThe positions of n feature points are collected to form FD_kAnd position sets PBD of template edges corresponding to n characteristic points_kAnd lightStream set G_kAnd sending the data to a feature point updating module. The specific method comprises the following steps:

6.1 making i ═ 1;

6.2 pairs of feature points f_iConstructing a matching error function:

for the

The match error function is constructed as follows:

ε denotes the error, p_j(t_k) Represents t_kTime template point p_jThe position of (a). r is_djDenotes e_dFrom template point p_jProbability of generation, i.e. e_dAnd p_jThe probability of a match. Here will r_djAs weights, r_djAs an event e_dAnd template point p_jProbability of match, r_djThe calculation formula of (a) is as follows:

6.3 minimizing the matching error function obtained by 6.2-step construction by adopting an EM-ICP algorithm, and solving to obtain the optimal optical flow

The solving process is as follows:

6.3.1 initialization

6.3.2 calculating r by equation (6)_dj；

6.3.3 updating the optical flow:

6.3.4 calculating the amount of change in optical flow

And order

Will be provided with

Is put to t_kTemporal optical flow set G_kIn (1).

6.3.5 if

Shows that the final optimization result is obtained

6.4, rotating; if it is

Turn 6.3.2. σ is the variation threshold of the optical flowThe value, which ranges from 0.01 to 0.1, in pixels per second.

6.4 let i ═ i + 1;

FD will be_k、PBD_kAnd G_kAnd sending the data to a feature point updating module, and turning to the seventh step.

7.1 making i ═ 1;

7.3 let i ═ i + 1;

8.1 making i ═ 1;

8.2.1 let j equal 1;

at t_kAt the moment of time, the time of day,

at t_k+1At the moment of time, the time of day,

wherein K isAnd the internal parameter matrix of the event camera is the self-contained parameters of the event camera. R is from t_kTime t_k+1The rotation matrix of the time-of-day event camera, t being from t_kTime t_k+1And the translation vector of the moment event camera is calculated through the obtained IMU data.

Is P_jAt t_kThe depth of the time camera coordinate system,

Is P_jAt t_k+1The depth in the camera coordinate system at the moment,

is F at t_kThe depth of the time camera coordinate system,

Is F at t_k+1The depth in the camera coordinate system at the moment.

The calculation formula is as follows:

Substituting equation (9) into equation (11), and comparing equation (11)The method is simplified to obtain:

8.2.4 let j ═ j + 1;

Will be provided with

8.3 let i ═ i + 1;

And a ninth step of making k equal to k + 1.

Step ten, if k is less than N, turning to the step four; otherwise, ending.

FIG. 3 is a comparison experiment result of average tracking accuracy errors between the present invention and the existing tracking method; the experimental results are obtained by testing The data set of The Event-Camera Dataset and Simulator according to The invention. The experimental environment is a notebook configured as i72.8GHz CPU and 8G RAM. The evaluation index of the experiment is the average tracking error of the feature points, and the unit is pixel. The left side of the graph is the name of the data sequence in the data set and the upper right side is the average tracking error of the feature point. The three columns of experimental data in the figure are the results of the test of the present invention, the method of Zhu et al, and the method of Kueng et al under the same test data sequence and the same experimental environment, respectively. Experimental results show that the present invention has a lower average tracking error across all test data sequences compared to the other two methods. In the figure, "x" indicates no data.

FIG. 4 is a comparison experiment result of the average tracking time of the present invention and the prior tracking method. The experiment is identical to the test data set and experimental environment of the experiment corresponding to fig. 3. The evaluation index of the experiment is the average tracking time of the feature points in seconds. The left side of the graph is the name of the data sequence in the data set, and the upper right side is the average tracking time of the feature point. The three columns of experimental data in the figure are the results of the test of the system, the system of Zhu et al, and the system of Kueng et al under the same test data series and the same experimental environment. Experimental results show that the present invention can achieve longer tracking on data sequences other than the data sequences "boxes _ translation" and "boxes _6 dof" compared to the other two methods.

Claims

1. A feature point tracking method based on an event camera is characterized by comprising the following steps:

the method comprises the following steps that firstly, a characteristic point tracking system based on an event camera is constructed, wherein the characteristic point tracking system of the event camera consists of a data acquisition module, an initialization module, an event set selection module, a matching module, a characteristic point updating module and a template edge updating module;

the data acquisition module is connected with the initialization module, the event set selection module and the template edge updating module; the data set acquisition module downloads data from an event camera data set, sends the acquired image frames to the initialization module, sends event streams to the event set selection module, and sends IMU data, namely inertial measurement unit data, to the template edge updating module;

the initialization module is connected with the data acquisition module, the event set selection module and the matching module; the initialization module receives the image frame from the data acquisition module, extracts the feature points and the edge map from the image frame to obtain the positions of the feature points and the positions of the template edges around the feature points, sends the positions of the feature points to the event set selection module, and sends the positions of the template edges around the feature points to the matching module;

the event set selection module is connected with the data acquisition module, the initialization module, the feature point updating module and the matching module, receives the event stream from the data acquisition module, receives the positions of the feature points from the initialization module or the feature point updating module, receives the optical streams of the feature points from the feature point updating module, selects the event set of the feature points from the event stream around each feature point, and sends the positions of the feature points and the corresponding event sets to the matching module;

the matching module is connected with the initialization module, the event set selection module, the feature point updating module and the template edge updating module, receives the positions of the feature points and the event sets corresponding to the feature points from the event set selection module, receives the positions of the template edges around the feature points from the initialization module or the template edge updating module, matches the event sets of the feature points with the template edges around the feature points, calculates the optical flow of each feature point, and sends the positions of the feature points, the positions of the template edges around the feature points and the optical flow of the feature points to the feature point updating module;

the characteristic point updating module is connected with the matching module, the template edge updating module and the event set selecting module, receives the positions of the characteristic points, the positions of template edges around the characteristic points and the optical flows of the characteristic points from the matching module, calculates the new positions of the characteristic points by using the optical flows of the characteristic points, sends the positions of the template edges around the characteristic points and the new positions of the characteristic points to the template edge updating module, sends the optical flows of the characteristic points and the new positions of the characteristic points to the event set selecting module, and outputs the new positions of the characteristic points;

the template edge updating module is connected with the data acquisition module, the feature point updating module and the matching module, receives the positions of template edges around the feature points and the new positions of the feature points from the feature point updating module, receives IMU data from the data acquisition module, updates the positions of the template edges around the feature points through the IMU data, and sends the updated positions of the template edges to the matching module;

secondly, the data acquisition module acquires image frames, event streams and IMU data from the event camera data set;

thirdly, using a characteristic point tracking system based on an event camera to track the slave t₀Time begins to t_NTracking the characteristic points in the event stream obtained by the data acquisition module within the time period of ending the moment, and carrying out time interval [ t₀,t_N]Divided into a series of sub-time intervals t₀,t₁],...,[t_k,t_k+1],...,[t_N-1,t_N]N represents the number of sub-time intervals, the size of N is determined by the time length of the event stream, and the value range is N is more than or equal to 1; the k +1 th time subinterval is denoted as [ t ]_k,t_k+1](ii) a The tracking process is as follows:

3.1, making the time sequence number k equal to 0;

3.2, if k is equal to 0, the initialization module performs an initialization operation, and the method includes:

3.2.1 the initialization module extracts feature points from the image frames obtained by the data acquisition module by using a Harris corner detection method, and places the extracted feature points in a set FD, so that the FD is set to be { f }₀,...,f_i,...,f_n}，f_iRepresenting the ith detected characteristic point, wherein n is the number of the characteristic points; considering the position of the feature points as a function of time, let t₀The positions of n characteristic points in the time FD are placed in the characteristic point position set FD₀In (FD)₀＝{f₀(t₀),...,f_i(t₀),...,f_n(t₀)}，f_i(t₀) Representative feature point f_iAt t₀Position of time, FD₀Sending the event to an event set selection module;

3.2.2 the initialization module extracts edge maps from the image frames obtained by the data acquisition module by using a Canny edge detection method, wherein each image frame corresponds to one edge map;

3.2.3 the initialization Module selects the template edge corresponding to n feature points in FD, and corresponds t to n feature points in FD₀Position set PBD of time-carving template edge₀Sending the data to a matching module, wherein the method comprises the following steps:

3.2.3.1 making i ═ 1;

The size is s multiplied by s, namely the length and the width of the rectangle are both s; the edge graph detected at 3.2.2 is shown in

The inner side is taken as f_iThe pixel point on the template edge is f_iThe template points of (1); definition of

Set of template points within

I.e. f_iCorresponding template edge, will

Is put into the template edge set PB,

p_jis f_iJ-th template point of (1), p_jAs a function of time, p_j(t₀) Represents p_jAt t₀The position of the moment of time is,

is shown at t₀Time p_jIn a rectangular area

Inner, m_iTo represent

The number of the middle template points;

3.2.3.3 let i ═ i + 1;

definition of

The middle template point is at t₀Set of locations of time of day

PBD (poly-p-phenylene terephthalamide)₀Sending the data to a matching module, and turning to the fourth step;

Estimate sub-time interval [ t ]_k,t_k+1]Is calculated according to the formula (2)_k+1：

Wherein

Is expressed in a sub-time interval t_k-1,t_k]Upper characteristic point f_iThe optical flow of (a);

is expressed in a sub-time interval t_k-1,t_k]Turning to 4.3 by the average light flow of all the characteristic points;

4.3 at t_kPosition set FD of n feature points at time_kSelecting an event set corresponding to each feature point around each position, wherein the method comprises the following steps:

4.3.1 making i ═ 1;

In and (2) mixing

Put into event set S:

The time range is the interval [ t_k,t_k+1]；e_dRepresents

The d-th event of (1, 2., z)_i}，z_iTo represent

The number of internal events;

x_drepresents an event e_dThe coordinates in the pixel coordinate system are,

represents an event e_dThe time of occurrence of the event(s),

denotes e_dHas pixel coordinates in

Internal;

4.3.3 let i ═ i + 1;

FD will be_kS is sent to a matching module, and the fifth step is carried out;

sixthly, the matching module matches the S with the template edges around the feature points to calculate t_kOptical flow set G of n feature points at time_kThe positions of n feature points are collected to form FD_kAnd position sets PBD of template edges corresponding to n characteristic points_kAnd optical flow set G_kAnd sending the data to a feature point updating module, wherein the specific method comprises the following steps:

6.1 making i ═ 1;

6.2 pairs of feature points f_iConstructing a matching error function by the following method:

correcting event e according to equation (4)_dAt t_kTime of dayThe position of (2):

is expressed in the time interval t_k,t_k+1]Upper characteristic point f_iThe optical flow of (a); symbol-representing dot product, defining symbol

The match error function is constructed as follows:

e denotes an error, p_j(t_k) Represents t_kTime template point p_jThe position of (a); r is_djDenotes e_dFrom template point p_jProbability of generation, i.e. e_dAnd p_jThe probability of a match; the double vertical lines in the formula represent the modulo operation on the vector in the double vertical lines, r_djThe calculation formula of (a) is as follows:

6.3 adopting EM-ICP algorithm to minimize the matching error function, solving and obtaining the optimal light stream

The method comprises the following steps:

6.3.1 initialization

6.3.2 calculating r by equation (6)_dj；

6.3.3 updating the optical flow:

6.3.4 calculating the amount of change in optical flow

And order

Will be provided with

Is put to t_kTemporal optical flow set G_kPerforming the following steps;

6.3.5 if

Shows that the final optimization result is obtained

6.4, rotating; if it is

6.3.2 is rotated; σ is a variation threshold of the optical flow;

6.4 let i ═ i + 1;

if i is less than or equal to n, 6.5, turning to 6.2; otherwise, it indicates that t is obtained_kTemporal optical flow set G_k，

FD will be_k、PBD_kAnd G_kSending the data to a feature point updating module, and turning to the seventh step;

seventhly, the characteristic point updating module receives the FD from the matching module_k、PBD_kAnd G_kCalculating t by optical flow_k+1Position set FD of n feature points at time_k+1G is_kAnd FD_k+1Sending the data to an event set selection module to select the FD_k+1And PBD_kSending the data to a template edge updating module, wherein the method comprises the following steps:

7.1 making i ═ 1;

by optical flow multiplied by time, from t_kTime t_k+1Time characteristic point f_iA movement distance;

7.3 let i ═ i + 1;

7.4 if i is less than or equal to n, rotating to 7.2; otherwise, it indicates that t is obtained_k+1The positions of n feature points at the moment are obtained, and a set FD is obtained_k+1，FD_k+1＝{f₁(t_k+1),...,f_i(t_k+1),...,f_n(t_k+1) }; g is to be_kAnd FD_k+1Sending the data to an event set selection module to select the FD_k+1And PBD_kSending the data to a template edge updating module; and will t_k+1Position set FD of n feature points at time_k+1Displaying or storing the result file, and turning to the eighth step;

eighthly, the template edge updating module receives IMU data from the data acquisition module and FD from the feature point updating module_k+1And PBD_kUsing IMU data pairs PBD_kIs updated to obtain t_k+1Position set PBD of template edge corresponding to n characteristic points at moment_k+1PBD of_k+1Sending the data to a matching module, wherein the method comprises the following steps:

8.1 making i ═ 1;

8.2.1 let j equal 1;

8.2.2 defining the symbol F as being AND_iCorresponding point in three-dimensional space, P_jIs a point p corresponding to the template_jPoints in the corresponding three-dimensional space, F and P_jAre all represented in the form of three-dimensional coordinates (x, y, z); template point p_jExpressed in homogeneous coordinate form, with the format of (x, y,1), the first two dimensions respectively represent p_jAlong the abscissa and ordinate in the pixel coordinate system, the following equation is obtained:

at t_kAt the moment of time, the time of day,

at t_k+1At the moment of time, the time of day,

k is an internal parameter matrix of the event camera and is a self-contained parameter of the event camera; r is from t_kTime t_k+1The rotation matrix of the time-of-day event camera, t being from t_kTime t_k+1The translation vector of the moment event camera is obtained through calculation of the obtained IMU data;

is P_jAt t_kThe depth of the time camera coordinate system,

Is P_jAt t_k+1The depth in the camera coordinate system at the moment,

is F at t_kTime of dayDepth under the camera coordinate system,

Is F at t_k+1Depth under a time camera coordinate system;

P_jAnd F is at t_k+1Having the same depth in the camera coordinate system at the moment, i.e. at

reduction of equation (12) to t_k+1The formula of the relative position of the time template point is as follows:

the symbol Nor () represents a homogeneous operation, i.e., the coordinates in parentheses are converted into homogeneous coordinates; obtaining t according to equation (14)_k+1Time template point p_jPosition p of_j(t_k+1) A 1 is to p_j(t_k+1) Is put to t_k+1Time characteristic point f_iSet of positions of surrounding formwork edges

8.2.4 let j ═ j + 1;

Will be provided with

8.3 let i ═ i + 1;

PBD (poly-p-phenylene terephthalamide)_k+1Sending the data to a matching module, and turning to the ninth step;

the ninth step, let k equal k + 1;

step ten, if k is less than N, turning to the step four; otherwise, ending.

2. The Event Camera-based feature point tracking method according to claim 1, wherein The Event Camera Dataset refers to "Event-Camera Dataset and Simulator" acquired by DAVIS, The Event-Camera Dataset and Simulator "comprising image frames, Event streams and IMU data; DAVIS refers to dynamic and active pixel sensors.

3. The event camera-based feature point tracking method of claim 1, wherein the template edge is represented by a set of all template points on the template edge.

4. A method for tracking feature points based on an event camera as claimed in claim 1 wherein step 3.2.3.2 said s ranges from 20 to 30 pixels.

5. The event camera-based feature point tracking method according to claim 1, wherein the value of σ in step 6.3.5 ranges from 0.01 to 0.1 in units of pixels per second.