CN116188533B

CN116188533B - Feature point tracking method and device and electronic equipment

Info

Publication number: CN116188533B
Application number: CN202310440591.3A
Authority: CN
Inventors: 余淮; 王翔远; 杨文�; 余磊; 王帆; 乔宁; 陈匡义
Original assignee: Chengdu Shizhi Technology Co ltd; Shenzhen Shizhi Technology Co ltd
Current assignee: Chengdu Shizhi Technology Co ltd; Shenzhen Shizhi Technology Co ltd
Priority date: 2023-04-23
Filing date: 2023-04-23
Publication date: 2023-08-08
Anticipated expiration: 2043-04-23
Also published as: CN116188533A

Abstract

The invention discloses a feature point tracking method and device and electronic equipment. In order to solve the defects of the existing characteristic point tracking scheme, such as sensitivity to noise, low accuracy and low response speed, the method acquires an event frame by compressing the event stream; creating a characteristic point coordinate list, a characteristic point serial number list and a characteristic point tracking frequency list for the characteristic points at the current moment and the last moment; inputting the event frame into an optical flow estimation network to obtain the optical flow of the pixel point; and obtaining the characteristic point coordinate list at the current moment according to the characteristic point coordinate list at the previous moment and the optical flow of the corresponding pixel point. According to the optical flow estimation network and the two moment feature point coordinates, the sequence numbers and the tracking times list, the invention obtains the technical effects of high time resolution, high dynamic range, high response speed and good robustness of feature point tracking. The invention is suitable for the fields of event cameras and computer vision.

Description

Feature point tracking method and device and electronic equipment

Technical Field

The invention belongs to the field of computer vision, and particularly relates to a feature point tracking method and device based on an event camera and electronic equipment.

Background

Feature point (detection and) tracking is one of the important fundamental problems in the field of computer vision. Conventional frame cameras naturally suffer from a time-dimensional information loss and low dynamic range due to imaging mechanisms, which can lead to failure of computer vision systems based on conventional frame cameras in high-speed, severe-illumination scenes. Unlike traditional frame cameras, event cameras only capture the illumination intensity variation in scenes, have the characteristics of ultralow delay, high dynamic range and low power consumption, make up for the defects of the traditional frame cameras in this respect, and therefore, based on the technical scheme of the event cameras, the event cameras gradually become research hotspots in the field of high-speed computer vision (especially automatic driving and the like).

The current feature point tracking scheme based on the event camera mainly comprises the following steps:

prior art 1: CN110390685B;

prior art 2: CN111899276a;

prior art 3: KR20220155041a.

The current feature point detection and tracking based on the event camera mainly comprises a method for taking an event frame as a traditional image frame to extract feature points, a method for directly using angular points of an asynchronous event stream to detect, and the like, wherein the methods are based on manual design parameters and a method for supposing photometric consistency, and have the defects of sensitivity to noise, low accuracy and low response speed, so that the feature point tracking scheme does not have better robustness.

For this reason, there is a need in the art for a feature point tracking scheme with high time resolution, high dynamic range, high response speed, and good robustness.

Disclosure of Invention

In order to solve or alleviate some or all of the above technical problems, the present invention is implemented by the following technical solutions:

a feature point tracking method, the method comprising the steps of: obtaining an event frame by compressing the event stream frame; creating a characteristic point coordinate list, a characteristic point serial number list and a characteristic point tracking frequency list for the characteristic points at the current moment and the last moment; inputting the event frame into an optical flow estimation network to obtain the optical flow of the pixel point; and obtaining the characteristic point coordinate list at the current moment according to the characteristic point coordinate list at the previous moment and the optical flow of the corresponding pixel point.

In some class of embodiments, the list of feature point coordinates for the current time is updated according to at least one of the following ways:

i) If the feature point coordinates of the feature points exceed the image size range of the event frame, deleting at least the feature point coordinates of the feature points from the feature point coordinate list at the current moment;

ii) if the number of event points in the set range of the feature point coordinates of the feature point is smaller than the threshold value, deleting at least the feature point coordinates of the feature point from the feature point coordinate list at the current moment.

In some embodiments, if the number of feature points in the feature point coordinate list at the current time is greater than or equal to the first number, the feature point coordinates are de-distorted and converted to a normalized plane according to the feature point coordinate list at the current time, the feature point coordinate list at the previous time, and the camera internal parameters, then an essential matrix is calculated by using a Ranac method, the feature points belonging to the external points are judged, and at least the feature point coordinates of the feature points are deleted from the feature point coordinate list at the current time.

In some embodiments, in addition to deleting the feature point coordinates of the feature point in the feature point coordinate list at the current time, the corresponding information in the feature point sequence number list at the current time and the feature point tracking number list at the current time of the feature point is deleted, and the corresponding information in the feature point coordinate list at the last time, the feature point sequence number list at the last time and the feature point tracking number list at the last time of the feature point is deleted.

In some embodiments, the count self-increment 1 operation is performed on all feature points recorded in the feature point coordinate list of the current time after various deletion update operations.

In some embodiments, if the number of feature points in the feature point coordinate list at the current time is smaller than the maximum feature point detection number, creating a mask of all 1 according to the feature point coordinate list at the current time and the feature point tracking times list at the current time; and, the feature points are ordered in a descending order according to the tracking times of the feature points, then the feature points are traversed in sequence, and the area in the set range around the feature points is set to be 0 in the mask.

In some embodiments, inputting the event frame into a feature point detection network to obtain new feature points; removing the characteristic points at the 0-value mask, reserving the first n new characteristic points in descending order according to the value of the heat map output by the characteristic point detection network, and updating the information of the n new characteristic points to a characteristic point coordinate list, a characteristic point serial number list and a characteristic point tracking frequency list at the current moment; wherein the integer n is equal to the maximum feature point detection number minus the number of feature points in the feature point coordinate list at the current time.

In some embodiments, the feature point coordinate list, the feature point serial number list and the feature point tracking frequency list at the current moment are respectively and correspondingly assigned to the feature point coordinate list, the feature point serial number list and the feature point tracking frequency list at the previous moment.

In some embodiments, the feature point coordinate list at the current moment is de-distorted and converted to the normalized coordinate plane according to the camera internal parameters, so as to obtain a de-distorted normalized feature point coordinate list.

In some class of embodiments, the feature points are obtained via a feature point detection network; the feature point detection network includes an encoder and a decoder.

In certain classes of embodiments, the encoder comprises ResNet or VGG; or/and, the decoder includes a convolutional layer and a ReLu-activated neuron.

In some class of embodiments, inputting the event frame into an encoder to obtain a first feature map; inputting the first feature map into a decoder to obtain a second feature map; removing the last channel of the second feature map after the second feature map passes through Softmax to obtain a third feature map; and performing reshaping on the third characteristic diagram to obtain a heat diagram.

In some embodiments, each pixel on the heat map is traversed, and if the value of the pixel is greater than the second threshold, the pixel is taken as the feature point.

A feature point tracking device configured to perform the feature point tracking method of any preceding claim.

An electronic device using a feature point tracking apparatus as described above for realizing feature point tracking.

Some or all embodiments of the present invention have the following beneficial technical effects:

1) By utilizing the high time resolution and the high dynamic range of the event camera, the problems of information loss of the traditional camera in the time dimension and information loss in a scene of severe illumination (high dynamic range) are solved.

2) According to the feature point tracking method, the feature point extraction is more robust and high in speed compared with the tracking in the prior art based on the feature detection module and the optical flow estimation module of deep learning.

3) And a distance measuring sensor is not needed, so that the cost is low.

4) And the estimation of the pose or/and the three-dimensional point cloud of the motion camera is completed by matching with the visual inertia tight coupling nonlinear optimization based on the sliding window.

Further advantageous effects will be further described in the preferred embodiments.

The above-described technical solutions/features are intended to summarize the technical solutions and technical features described in the detailed description section, and thus the ranges described may not be exactly the same. However, these new solutions disclosed in this section are also part of the numerous solutions disclosed in this document, and the technical features disclosed in this section and the technical features disclosed in the following detailed description section, and some contents in the drawings not explicitly described in the specification disclose more solutions in a reasonable combination with each other.

The technical scheme combined by all the technical features disclosed in any position of the invention is used for supporting the generalization of the technical scheme, the modification of the patent document and the disclosure of the technical scheme.

Drawings

FIG. 1 is a flow chart of a visual odometer scheme;

FIG. 2 is a schematic diagram of a feature point detection network;

FIG. 3 is a schematic diagram of an optical flow estimation network in some class of embodiments;

FIG. 4 is a flow chart of a feature point tracking method of the present invention;

fig. 5 is a configuration diagram of an electronic device in an embodiment of the invention.

Detailed Description

Since various alternatives are not exhaustive, the gist of the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention. Other technical solutions and details not disclosed in detail below, which generally belong to technical objects or technical features that can be achieved by conventional means in the art, are limited in space and the present invention is not described in detail.

Except where division is used, any position "/" in this disclosure means a logical "or". The ordinal numbers "first", "second", etc., in any position of the present invention are used merely for distinguishing between the labels in the description and do not imply an absolute order in time or space, nor do they imply that the terms preceded by such ordinal numbers are necessarily different from the same terms preceded by other ordinal terms.

The present invention will be described in terms of various elements for use in various combinations of embodiments, which elements are to be combined in various methods, products. In the present invention, even if only the gist described in introducing a method/product scheme means that the corresponding product/method scheme explicitly includes the technical feature.

The description of a step, module, or feature in any location in the disclosure does not imply that the step, module, or feature is the only step or feature present, but that other embodiments may be implemented by those skilled in the art with the aid of other technical means according to the disclosed technical solutions. The embodiments of the present invention are generally disclosed for the purpose of disclosing preferred embodiments, but it is not meant to imply that the contrary embodiments of the preferred embodiments are not intended to cover all embodiments of the invention as long as such contrary embodiments are at least one technical problem addressed by the present invention. Based on the gist of the specific embodiments of the present invention, a person skilled in the art can apply means of substitution, deletion, addition, combination, exchange of sequences, etc. to certain technical features, so as to obtain a technical solution still following the inventive concept. Such solutions without departing from the technical idea of the invention are also within the scope of protection of the invention.

An overall block diagram of a visual odometer (VIO) method as set forth in the present invention is shown in fig. 1, which is configured to be implemented in a visual odometer device, which may be an AI chip, FPGA, GPU, and ASIC integrated circuit and related equipment. The system and the corresponding method thereof comprise the following steps:

because of the asynchronous sparse nature of events, framing (also known as compression) of the event stream, i.e., assembling the continuous event stream into multiple independent frame images, is required to accommodate the input of the convolutional neural network. For the event stream in the event camera, taking events in a time interval, dividing the event stream into N segments (the number of segments N is a positive integer, preferably equal), and respectively performing accumulation and splicing ON positive and negative polarity data (ON/OFF events) of each segment to obtainEvent frame tensor (abbreviated herein as event frame) E _t Wherein->、/>、/>Representing the height and width of the image and the number of segments, respectively, event frame E _t Inputs for the feature detection network and the optical flow estimation network.

Optionally, the event camera has a resolution of H 'xW', which may or may not be sized by resizing (resize) for various needsTensor, h=h ', w=w' if no resizing is selected. For event frame tensor E _t The compressed third dimension 2N is 1, which can be considered as a pixel/event point with h×w resolution.

In addition, inertial Measurement Unit (IMU) data is acquired over the same time interval and pre-integrated.

The overall structure of the feature point detection network is shown in fig. 2, which is a neural network that includes two parts, namely an encoder and a decoder. Encoder selection includes, but is not limited to, VGG or ResNet, etc.; or/and, the decoder is composed of convolutional layer, reLU activated neurons.

Event frame E for completing framing _t The input encoder can output a first characteristic diagram F with the size of H/8×W/8×128 ₁ First characteristic diagram F ₁ The input decoder can output a second characteristic diagram F with the size of H/8 XW/8X 65 ₂ After Softmax the last channel is removed to obtain a third feature map F with dimensions H/8 XW/8X 64 ₃ For the characteristic diagram F ₃ And (3) performing reshaping (reshape) to obtain a heat map of H multiplied by W multiplied by 1, and performing threshold value screening on the value of each pixel on the heat map, wherein, for example, pixel points with pixel values larger than 1/65 of the threshold value are used as feature points to be output.

Feature map F ₃ The 64 channels in (a) correspond to 8×8 region of original image, feature image F ₂ The 65 th channel of the number is called the garbage bin (durtbin) channel. During training, if the ith position in the 8 multiplied by 8 area has a pixel point, the feature map F ₃ The i-th channel of (a) should be 1, and if no pixel exists in the 8×8 region, the 65-th channel should be 1. The technical means has the advantages that: when there is no feature point, feature map F ₂ After Softmax processing, the values of the first 64 channels are fully compressed (close to 0), so that the values of the first 64 channels are far smaller than a threshold value, and more accurate characteristic point judgment is realized.

For the reforming process described above, the tensor of H/8×W/8×64 can be considered as an image of height H/8 and width W/8, with 64 channels per pixel on the image, and these 64 channels can be changed to 8×8 sub-images, so the original image can be converted to H×W×1 size.

The specific process of the threshold value screening is, for example, traversing each pixel point on the heat map, and if the value of the pixel point is greater than the second threshold value, taking the pixel point as a feature point.

To obtain the data set required to train the feature point detection network, this may be achieved, for example, by generating a simulated event data set. Generating event data for the RGB image in the COCO2014 dataset by using an event data simulator ESIM; detecting on RGB image by Harris corner detection algorithm to obtain initial feature point detection label, and obtaining event frame tensor E with size of H×W×2N _t Each pixel of the array is provided with 2N channels, each channel is provided with different values, the values of each channel of each pixel are added, the pixel with the value larger than 0 is set as 1, a mask with the size of H multiplied by W multiplied by 1 is obtained, and the characteristic points at the position of 1 of the mask are reserved, so that the final characteristic point label is obtained. The simulation event data set can be used for performing supervision training on the characteristic point detection network. Preferably, homography (Homography) transformation is also used for data enhancement in the training process.

Optical flow estimation is implemented through an optical flow estimation network, the choices of which include, but are not limited to, spike-FlowNet (whose architecture is shown in fig. 3), CN115953438A, etc.

Specific reference may be made to: lee C, kosta A K, zhu A Z, et al Spike-flowet: event-based optical flow estimation with energy-efficient hybrid neural networks [ C ]// European Conference on Computer vision, springer, cham, 2020:366-382.

The present invention incorporates by reference in its entirety the above-described optical flow estimation scheme. For example, supervised training of the optical flow estimation network may be accomplished through the MVSEC dataset. The invention is not limited to a specific form of optical flow estimation network and its training method.

Preferably, the optical flow of the pixel points output by the optical flow estimation network is dense optical flow. For example, the optical flow of each pixel is output for h×w pixels.

Referring to fig. 4, for a feature point tracking method, which relies on a feature point detection step, specifically the tracking method includes:

creating an empty current moment feature point coordinate list F _cur ={(x _i , y _i ) List of feature point sequence numbers ID _cur And feature point tracking number list CNT _cur Creating an empty last-moment feature point coordinate list F _last ={(x _i , y _i ) Sequence number list ID of feature point at last moment _last And a last time feature point tracking number list CNT _last And setting the maximum feature point detection number n _max The method comprises the steps of carrying out a first treatment on the surface of the And setting camera parameters such as a matrix composed of focal length, origin offset coefficient, scaling coefficient, distortion coefficient, and the like.

The feature point sequence number list is convenient for feature point correspondence at different moments. In other words, the same feature points have the same sequence number at different times. The updating operation of the above lists mainly comprises adding and deleting operation (possibly causing the length change of the list) of adding and deleting the relevant information of the feature points, and storing numerical value updating operation (such as self-increasing of the count value) in the list.

In some types of embodiments, the aforementioned current time F _cur 、ID _cur 、CNT _cur List and F at last moment _last 、ID _last 、CNT _last The lists can be used for determining the matching relation through indexes, and the corresponding relation can be kept only when the lengths of the lists at the current moment and the lengths of the lists at the last moment are always ensured to be the same. To this end, at any time F _cur 、ID _cur 、CNT _cur When the list executes the adding and deleting updating operation, F is compared with F _last 、ID _last 、CNT _last The list also needs to perform the corresponding addition and deletion update. I.e. adding or deleting one of the lists, means that the corresponding adding or deleting operation is also performed on the remaining 5 lists.

If the characteristic point coordinate list F at the last moment _last The number of feature points in (a) is greater than 0: event frame (tensor) E _t Inputting the optical flow estimation network to obtain the optical flow corresponding to each pixel pointThe method comprises the steps of carrying out a first treatment on the surface of the From F _last The coordinates of all the feature points are taken out from the list, and the coordinates of each feature point are updated according to the optical flow:

。

for feature points exceeding the event frame image size (H W), they are culled and updated (here, the information of the culled feature points in the list is deleted) F _cur 、ID _cur 、CNT _cur A list.

Calculating (in the non-feature point coordinate list) the set range (for example, [ x.+ -. R, y.+ -. R)]Inner) number of event points, if the number of event points is smaller than the first threshold value beta, eliminating the feature point (which is regarded as noise point), updating F _cur 、ID _cur 、CNT _cur A list.

If the current F _cur The number of feature points in the list is greater than or equal to the first number (illustratively, 8), then from F _last And F _cur Extracting all the characteristic points from the two sets of characteristic points, de-distorting the extracted two sets of characteristic points according to internal parameters of a camera and converting the extracted two sets of characteristic points to a normalized plane (specifically, distorted coordinates can be obtained by using distortion coefficients, values on undistorted coordinates can be obtained by using the correspondence between the distorted coordinates and undistorted coordinates, an interpolation algorithm can be used in the process), then calculating an essential matrix (an essential matrix) by using a Ranac method to remove outliers (points which do not meet the current model, such as points which are far away from the straight line in the set of points, are currently fitted according to the set of points), and taking the characteristic points (corresponding information) to be removed from F _cur 、ID _cur 、CNT _cur List and F _last 、ID _last 、CNT _last Delete from list to implement F _cur 、ID _cur 、CNT _cur List and F _last 、ID _last 、CNT _last Updating the list.

Updating the current timeThe symptom tracking times list CNT _cur Each value in the tracking number list is added to 1. Specifically, the coordinate list F of the characteristic points at the current moment after various eliminating processes is adopted _cur (not including F) _last ) All the characteristic points recorded in (a) are recorded in CNT _cur The corresponding tracking times of the two are increased by 1 respectively.

Further, if the current feature point coordinate list F _cur The number of the feature points in the model is smaller than the maximum feature point detection number n _max : let the number of feature points at the current time be n _cur Calculating the number of the currently required feature points n=n _max -n _cur The method comprises the steps of carrying out a first treatment on the surface of the And extracting the coordinates of all the features and the tracking times of the feature points from the current time feature point coordinate list and the current time feature point tracking times list, and creating a mask (mask) of all 1. Descending sorting is carried out on the characteristic points according to the tracking times of the characteristic points, then the points are selected in sequence and in a traversing way according to the sorting, and the area in a set range around the characteristic points is set to be 0 in the mask; inputting the event frame into a feature point detection network to obtain new feature points, removing feature points at 0-value mask, outputting heat map values according to the feature point detection network, reserving the previous n (larger heat map values) new feature points in descending order, and adding F _cur 、ID _cur 、CNT _cur List of feature points maintained at this time as n _max 。

Then, the characteristic point coordinate list F at the current moment is displayed _cur And corresponding ID _cur 、CNT _cur Feature point coordinate list F assigned to the previous time _last Corresponding ID _last 、CNT _last 。

Then, the characteristic point coordinate list at the current moment is de-distorted according to the camera internal parameters and is converted to a normalized coordinate plane, and a normalized characteristic point coordinate list F after de-distortion is obtained _un 。

For some logic of implementing the feature point tracking method, reference may be made specifically to fig. 4, which is not described herein.

With continued reference to fig. 1, a method of camera pose or/and point cloud estimation is described next.

Determining the size of a sliding window, creating an empty container F, and carrying out de-distortion normalization on a characteristic point coordinate list F obtained by detection at each moment _un And F _cur 、ID _cur 、CNT _cur The list is stored as a frame of data in the container F (the container F comprises a number of such frame data) and participates in the selection and subsequent optimization of key frames (a subset of the frames in the container F).

Preferably, in order to ensure the system operation efficiency, the invention adopts a sliding window-based method, and only key frames are reserved for optimization. And if the characteristic average parallax between the current frame and the nearest key frame is larger than a threshold value or the number of the characteristic points tracked by the current frame is smaller than the threshold value, setting the current frame as the key frame. For frames that need to be shifted out of the window, the limination information may be retained, for example, using the Schur complex limination method.

If the initialization is currently incomplete (SFM initialization and visual inertial joint initialization), then a motion structure recovery (Structure From Motion, SFM) three-dimensional reconstruction method is used for purely visual initialization, which comprises:

1) Taking each frame i and the last frame in the sliding window to calculate average parallax, setting the ith frame as a reference frame when the average parallax is greater than a certain threshold value, otherwise, failing to initialize; i is the serial number identification of the frame;

2) Calculating the pose of the current frame relative to the reference frame according to epipolar constraint, and triangulating feature points to obtain pose information and three-dimensional feature point information of the current frame;

3) Solving pose information and three-dimensional feature point information of the rest frames in the sliding window by using PnP;

4) And (3) minimizing the re-projection error of all frames (frames in the container F) in the sliding window by using a beam adjustment method (Bundle Adjustment, BA, also called bundle adjustment), obtaining the optimized camera pose and three-dimensional characteristic points, and finishing SFM pure vision initialization.

Then, in order to restore the real scale, aligning the visual estimation result with the IMU pre-integration result to realize the calibration of the gyroscope bias; solving the direction of the gravity vector, recovering the scale factors of the monocular camera, obtaining the constraint relation between the event camera and the IMU, and completing the visual inertia joint initialization.

If SFM initialization and visual inertia joint initialization are finished currently, visual inertia close-coupling nonlinear optimization is performed: the final optimized camera pose information, preferably also the point cloud information, is output by an optimization method (such as the Levenberg-Marquardt method, the Gauss-Newton method) that minimizes the visual re-projection errors in the sliding window, the IMU variance residual, bias residual and marginalization information.

Since merely predicting pose using a monocular camera may result in a lack of scale factors between the predicted pose and the actual pose, the present invention solves this technical problem by the aforementioned initialization, avoiding the sensor of prior art 1 relying on additional measurement distances.

The invention also discloses an electronic device, which is provided with the characteristic point tracking device or/and the visual inertial odometer device, and is configured to realize the characteristic point tracking method or/and the odometer method. Referring to fig. 5, the electronic device herein may include a visual odometer means, and in some embodiments may include a feature point tracking means therein. The feature point tracking device is used to perform the aforementioned feature point tracking method, and alternatively the feature point detection method may also be run in the feature point tracking device or may be located in another device in the visual odometer device independently of the feature point tracking device. The functional blocks of the present invention that perform these methods may be flexibly implemented in different blocks, ultimately coupled as a complete system, and the present invention is not limited in any way.

In some embodiments, the electronic device may be an unmanned aerial vehicle, a robot for various services (sweeping, delivering meals, dispensing items, etc.), and the like. The feature point tracking device and the visual inertial odometer device may be operated in an FPGA, an ASIC, an AI chip, or a GPU device, and the present invention is not limited to a specific manner.

Although the present invention has been described with reference to specific features and embodiments thereof, various modifications, combinations, substitutions can be made thereto without departing from the invention. The scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification, but rather, the methods and modules may be practiced in one or more products, methods, and systems of the associated, interdependent, inter-working, pre/post stages.

The specification and drawings are, accordingly, to be regarded in an abbreviated manner as an introduction to some embodiments of the technical solutions defined by the appended claims and are thus to be construed in accordance with the doctrine of greatest reasonable interpretation and are intended to cover as much as possible all modifications, changes, combinations or equivalents within the scope of the disclosure of the invention while also avoiding unreasonable interpretation.

Further improvements in the technical solutions may be made by those skilled in the art on the basis of the present invention in order to achieve better technical results or for the needs of certain applications. However, even if the partial improvement/design has creative or/and progressive characteristics, the technical idea of the present invention is relied on to cover the technical features defined in the claims, and the technical scheme shall fall within the protection scope of the present invention.

The features recited in the appended claims may be presented in the form of alternative features or in the order of some of the technical processes or the sequence of organization of materials may be combined. Those skilled in the art will readily recognize that such modifications, changes, and substitutions can be made herein after with the understanding of the present invention, by changing the sequence of the process steps and the organization of the materials, and then by employing substantially the same means to solve substantially the same technical problem and achieve substantially the same technical result, and therefore such modifications, changes, and substitutions should be made herein by the equivalency of the claims even though they are specifically defined in the appended claims.

The steps and components of the embodiments have been described generally in terms of functions in the foregoing description to clearly illustrate this interchangeability of hardware and software, and in terms of various steps or modules described in connection with the embodiments disclosed herein, may be implemented in hardware, software, or a combination of both. Whether such functionality is implemented as hardware or software depends upon the particular application or design constraints imposed on the solution. Those of ordinary skill in the art may implement the described functionality using different approaches for each particular application, but such implementation is not intended to be beyond the scope of the claimed invention.

Claims

1. The characteristic point tracking method is characterized by comprising the following steps of:

obtaining an event frame by compressing the event stream frame;

creating a characteristic point coordinate list, a characteristic point serial number list and a characteristic point tracking frequency list for the characteristic points at the current moment and the last moment;

inputting the event frame into an optical flow estimation network to obtain the optical flow of the pixel point;

obtaining a characteristic point coordinate list at the current moment according to the characteristic point coordinate list at the previous moment and the optical flow of the corresponding pixel point; and, in addition, the processing unit,

the feature points are obtained by inputting an event frame into a feature point detection network;

the characteristic point coordinate list, the characteristic point serial number list and the characteristic point tracking frequency list at the current moment are the same as the characteristic point coordinate list, the characteristic point serial number list and the characteristic point tracking frequency list at the previous moment in length.

2. The feature point tracking method according to claim 1, characterized in that:

updating the feature point coordinate list at the current moment according to at least one of the following modes:

3. The feature point tracking method according to claim 1, characterized in that:

if the number of the feature points in the feature point coordinate list at the current moment is greater than or equal to the first number, the feature point coordinates are de-distorted and converted to a normalized plane according to the feature point coordinate list at the current moment, the feature point coordinate list at the last moment and the camera internal parameters, then an essential matrix is calculated by using a Ranac method, the feature points belonging to the external points are judged, and at least the feature point coordinates of the feature points are deleted from the feature point coordinate list at the current moment.

4. A feature point tracking method as claimed in claim 2 or 3, characterized in that:

in addition to deleting the feature point coordinates of the feature point in the feature point coordinate list at the current time, deleting the corresponding information in the feature point sequence number list at the current time and the feature point tracking times list at the current time of the feature point, and deleting the corresponding information in the feature point coordinate list at the last time, the feature point sequence number list at the last time and the feature point tracking times list at the last time of the feature point.

5. The feature point tracking method according to claim 4, characterized in that:

and executing the counting self-increasing 1 operation on all the characteristic points recorded in the characteristic point coordinate list of the current moment after various deleting and updating operations.

6. The feature point tracking method according to any one of claims 1 to 3, 5, characterized in that:

if the number of the feature points in the feature point coordinate list at the current moment is smaller than the maximum feature point detection number, creating a full 1 mask according to the feature point coordinate list at the current moment and the feature point tracking frequency list at the current moment; and, the feature points are ordered in a descending order according to the tracking times of the feature points, then the feature points are traversed in sequence, and the area in the set range around the feature points is set to be 0 in the mask.

7. The feature point tracking method according to claim 6, characterized in that:

inputting the event frames into a feature point detection network to obtain new feature points;

removing the characteristic points at the 0-value mask, reserving the first n new characteristic points in descending order according to the value of the heat map output by the characteristic point detection network, and updating the information of the n new characteristic points to a characteristic point coordinate list, a characteristic point serial number list and a characteristic point tracking frequency list at the current moment; wherein the integer n is equal to the maximum feature point detection number minus the number of feature points in the feature point coordinate list at the current time.

8. The feature point tracking method according to claim 7, characterized in that:

and respectively and correspondingly assigning the characteristic point coordinate list, the characteristic point serial number list and the characteristic point tracking frequency list at the current moment to the characteristic point coordinate list, the characteristic point serial number list and the characteristic point tracking frequency list at the previous moment.

9. The feature point tracking method according to claim 8, characterized in that:

and according to the camera internal parameters, carrying out de-distortion on the characteristic point coordinate list at the current moment and converting the characteristic point coordinate list to a normalized coordinate plane to obtain a normalized characteristic point coordinate list after de-distortion.

10. The feature point tracking method according to any one of claims 1 to 3, 5, 7 to 9, characterized in that:

the feature point detection network includes an encoder and a decoder.

11. The feature point tracking method according to claim 10, characterized in that:

the encoder comprises ResNet or VGG; or/and the combination of the two,

the decoder includes a convolutional layer and a ReLu-activated neuron.

12. The feature point tracking method according to claim 10, characterized in that:

inputting the event frame into an encoder to obtain a first feature map;

inputting the first feature map into a decoder to obtain a second feature map;

removing the last channel of the second feature map after the second feature map passes through Softmax to obtain a third feature map;

and performing reshaping on the third characteristic diagram to obtain a heat diagram.

13. The feature point tracking method according to claim 12, characterized in that:

and traversing each pixel point on the heat map, and taking the pixel point as a characteristic point if the value of the pixel point is larger than a second threshold value.

14. The utility model provides a characteristic point tracking device which characterized in that:

the apparatus being configured to perform the feature point tracking method of any one of claims 1-13.

15. An electronic device, characterized in that:

the electronic device uses the characteristic point tracking device of claim 14 for realizing characteristic point tracking.