CN113870322A

CN113870322A - Event camera-based multi-target tracking method and device and computer equipment

Info

Publication number: CN113870322A
Application number: CN202110968923.6A
Authority: CN
Inventors: 粟傈; 王向禹; 杨帆; 李金健; 胡权
Original assignee: Beijing Institute of Technology BIT; Capital Normal University
Current assignee: Beijing Institute of Technology BIT; Capital Normal University
Priority date: 2021-08-23
Filing date: 2021-08-23
Publication date: 2021-12-31

Abstract

The application provides a multi-target tracking method, a multi-target tracking device and computer equipment based on an event camera, wherein the method comprises the steps of acquiring the number of trigger events in each sampling period of a plurality of continuous sampling periods in a detected scene; if the number of trigger events in a plurality of continuous sampling periods exceeds a preset environmental noise threshold value, processing the trigger events in the current latest sampling period through a DBSCAN clustering algorithm to determine the number of moving objects in the detected scene and the initial position triggered by each moving object; and determining event image frames containing all moving objects in the detected scene according to the data stream of the trigger event, and tracking the moving objects according to the event image frames. The method and the device can quickly and accurately track and position a plurality of moving objects in the detected scene, ensure lower time delay and extremely low calculation amount, reduce power consumption and be applied to the mobile platform.

Description

Event camera-based multi-target tracking method and device and computer equipment

Technical Field

The application relates to the technical field of computer vision, in particular to a multi-target tracking method and device based on an event camera and computer equipment.

Background

Currently, there are two main methods for detecting and tracking a target object: (1) by adopting the traditional image processing means, image preprocessing is required, such as smooth image and feature extraction, and then moving object detection is carried out by using the traditional image processing algorithm such as a frame difference method, or methods such as color matching and boundary identification are carried out to detect an object with a specific color or shape; (2) through the technologies of neural network, deep learning and the like, a specific target object is detected and tracked on a traditional image. However, in order to ensure the accuracy of detecting and tracking the target object, the two methods need to process the whole image, but such processing generates a large amount of calculation and requires strong hardware support; in addition, for an object moving at a high speed, due to the imaging principle of a traditional camera, a finally obtained image generates dynamic blurring, certain difficulty is caused to detection and tracking, the time delay is high, and quick response cannot be guaranteed in a high-speed scene.

Disclosure of Invention

The present application is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, a first objective of the present application is to provide a multi-target tracking method based on an event camera, which solves various disadvantages of the existing methods, performs fast and accurate tracking and positioning on a plurality of moving objects in a detected scene, ensures a low time delay, ensures an extremely low computation amount, reduces power consumption, and is applied to a mobile platform.

A second objective of the present application is to provide a multi-target tracking device based on an event camera.

A third object of the present application is to propose a computer device.

In order to achieve the above object, an embodiment of the present application in a first aspect provides a method for multi-target tracking based on an event camera, including:

acquiring the number of trigger events in each sampling period of a plurality of continuous sampling periods in a detected scene;

if the number of trigger events in a plurality of continuous sampling periods exceeds a preset environmental noise threshold value, processing the trigger events in the current latest sampling period through a DBSCAN clustering algorithm to determine the number of moving objects in the detected scene and the initial position triggered by each moving object;

and determining event image frames containing all moving objects in the detected scene according to the data stream of the trigger event, and tracking the moving objects according to the event image frames.

According to the multi-target tracking method based on the event camera, the number of trigger events in each sampling period of a plurality of continuous sampling periods in a detected scene is obtained; if the number of trigger events in a plurality of continuous sampling periods exceeds a preset environmental noise threshold value, processing the trigger events in the current latest sampling period through a DBSCAN clustering algorithm to determine the number of moving objects in the detected scene and the initial position triggered by each moving object; and determining event image frames containing all moving objects in the detected scene according to the data stream of the trigger event, and tracking the moving objects according to the event image frames. The method and the device can quickly and accurately track and position a plurality of moving objects in the detected scene, ensure lower time delay and extremely low calculation amount, reduce power consumption and be applied to the mobile platform.

Optionally, in an embodiment of the present application, before acquiring the number of trigger events in each of a plurality of sampling periods in the scene to be tested, the method further includes:

acquiring the number of trigger events in each sampling period of a plurality of continuous sampling periods in the scene to be tested, wherein the scene to be tested is a static scene;

and calculating the variance of the number of the trigger events in a plurality of continuous sampling periods, and if the variance is smaller than a preset threshold, determining the preset environmental noise threshold according to the mean value of the number of the trigger events in the plurality of continuous sampling periods.

Optionally, in an embodiment of the present application, before the processing, by the DBSCAN clustering algorithm, the triggering events in the plurality of sampling periods, the method includes:

and determining the radius of the adjacent area matched with the trigger event and the preset number of the trigger events in the area of the adjacent area radius.

Optionally, in an embodiment of the present application, the processing, by the DBSCAN clustering algorithm, the trigger events in the multiple sampling periods to determine the number of moving objects in the detected scene and an initial position triggered by each of the moving objects specifically includes:

determining the actual number of the trigger events contained in the adjacent area radius by any trigger event in the trigger events through the DBSCAN clustering algorithm, wherein if the actual number is not less than the preset number, the category of any trigger event is determined to be a core point;

if the actual number is smaller than the preset number, but the trigger event is within the adjacent area radius of any core point, determining that the category of any trigger event is an edge point;

if the actual number is smaller than the preset number and the trigger event is not within the radius of the adjacent area of any core point, determining that the category of any trigger event is a noise point;

and determining the number of moving objects in the detected scene and the initial position triggered by each moving object according to the noise points, the core points and the edge points.

Optionally, in an embodiment of the present application, the category of any trigger event is determined by:

wherein p is the category of any trigger event, core is the core point, border is the edge point, noise is the noise point, e is the radius of the adjacent area matched with the trigger event, COUNT (N (p, e)) is the actual number of trigger events contained in the area with the e as the radius of the p point, and minPts is the preset number of trigger events contained in the area with the e as the radius.

Optionally, in an embodiment of the present application, the method further includes:

and carrying out weight distribution on the DBSCAN clustering result of the initial position triggered by each moving object:

if any trigger event is the core point of the initial position, the corresponding weight of the point is the first weight,

and if any trigger event is the edge point of the initial position, the corresponding weight of the point is a second weight, wherein the first weight is greater than the first weight.

Optionally, in an embodiment of the present application, the tracking all moving objects according to the event image frame includes:

determining the moving direction and distance of a moving object in the event image frames by adopting an optical flow error algorithm, wherein the event image frames comprise a first event image frame and a second event image frame;

the determining the moving direction and the distance of the moving object corresponding to the event image frame by adopting the optical flow error algorithm specifically comprises the following steps:

determining a first coordinate position of a center of mass point of the moving object in the first event image frame;

determining a second coordinate position of the centroid point in the second event image frame after one or more of the preset sampling periods;

and determining the motion component of the center of mass point in the coordinate axis direction according to the first coordinate position and the second coordinate position, and determining the motion track of the moving object according to the motion component of the center of mass point.

Optionally, in an embodiment of the present application, the determining a motion trajectory of the moving object according to the motion component of the centroid point includes: obtaining the motion component of the centroid point in the coordinate axis direction when the optical flow error is minimum by the following formula to determine the motion track of the moving object:

wherein e (d) is an optical flow error, C is a coordinate set of trigger events composed of DBSCAN clustering results of initial positions triggered by any moving object, (x, y) is a coordinate of any trigger event in C on a first event image frame, dx and dy are motion components of centroid points of the moving object in a coordinate axis direction after one or more preset sampling periods, x + dx, y + dy) are positions of any trigger event in C on a second event image frame, I (x, y) is a pixel brightness value of any trigger event in C on the first event image frame, J (x + dx, y + dy) is a pixel brightness value of any trigger event in C on the second event image frame, and weight (x, y) is different weight coefficients corresponding to core points and edge points.

To achieve the above object, a second aspect of the present application provides an event camera-based multi-target tracking apparatus, including:

the acquisition module is used for acquiring the number of trigger events in each sampling period of a plurality of continuous sampling periods in a detected scene;

the determining module is used for processing the triggering event in the current latest sampling period through a DBSCAN clustering algorithm to determine the number of moving objects in the detected scene and the initial position triggered by each moving object if the number of the triggering events in a plurality of continuous sampling periods exceeds a preset environmental noise threshold;

and the tracking module is used for determining the positions of all moving objects in the detected scene in the event image frame according to the data stream of the trigger event and tracking the moving objects according to the event image frame.

According to the multi-target tracking device based on the event camera, the number of trigger events in each sampling period of a plurality of continuous sampling periods in a detected scene is obtained; if the number of trigger events in a plurality of continuous sampling periods exceeds a preset environmental noise threshold value, processing the trigger events in the current latest sampling period through a DBSCAN clustering algorithm to determine the number of moving objects in the detected scene and the initial position triggered by each moving object; and determining event image frames containing all moving objects in the detected scene according to the data stream of the trigger event, and tracking the moving objects according to the event image frames. The method and the device can quickly and accurately track and position a plurality of moving objects in the detected scene, ensure lower time delay and extremely low calculation amount, reduce power consumption and be applied to the mobile platform.

To achieve the above object, a third aspect of the present application provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method according to the first aspect of the present application is implemented.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic diagram of event camera data provided in an embodiment of the present application in comparison to conventional camera output;

FIG. 2 is a flowchart illustrating a multi-target tracking method based on an event camera according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating a trigger event generated by a moving object according to an embodiment of the present application;

fig. 4 is a schematic diagram of an event camera output data stream and a DBSCAN clustering result thereof provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of a moving object tracking system according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a multi-target tracking apparatus based on an event camera according to an embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. On the contrary, the embodiments of the application include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.

Based on the above description, the problem in the field of computer vision technology is that for a high-speed moving object, the image finally acquired by a conventional camera generates dynamic blur, which causes certain difficulty in detection and tracking, and has high time delay, rapid response cannot be guaranteed in a high-speed scene, and the whole image needs to be processed, thereby generating a large amount of calculation. Aiming at the problems, the application provides a multi-target tracking method based on an event camera, which can quickly and accurately track and position a plurality of moving objects in a detected scene, simultaneously can ensure extremely low calculation amount, reduces power consumption and is applied to a mobile platform.

Compared with the traditional camera, the event camera can track the moving object under the conditions of dark light or backlight overexposure at night, has a wider detection range and is suitable for more occasions; compared with detection modes such as a frame difference method and the like used by a traditional camera, the event camera can sense the occurrence of a moving object more quickly in a data stream mode, can adjust the detection frequency, can provide sensing efficiency from several milliseconds to tens of milliseconds, greatly improves the efficiency, and outputs the event camera and the traditional camera in a pair manner as shown in fig. 1.

The following describes a multi-target tracking method, device and computer equipment based on an event camera according to an embodiment of the present application with reference to the drawings.

Fig. 2 is a flowchart of a multi-target tracking method based on an event camera according to an embodiment of the present disclosure.

As shown in fig. 2, the multi-target tracking method based on an event camera provided in the embodiment of the present application includes the following steps:

step 110, acquiring the number of trigger events in each sampling period of a plurality of continuous sampling periods in a scene to be tested;

step 120, if the number of trigger events in a plurality of continuous sampling periods exceeds a preset environmental noise threshold, processing the trigger events in the current latest sampling period through a DBSCAN clustering algorithm to determine the number of moving objects in the detected scene and the initial position triggered by each moving object;

and step 130, restoring the event data stream data into an event image frame according to the data stream of the trigger event, determining the centroid points of all moving objects in the current event image frame, and tracking the detected moving objects according to the event image frame.

Specifically, the event camera is a sensor responding to the brightness change of a pixel, and can asynchronously respond to a single pixel, when the brightness change value of a certain pixel exceeds a given threshold, the pixel is triggered and outputs a trigger event, and the brightness change value is determined by the following formula:

wherein d is the time t and t of the pixel point₀The brightness variation difference between the moments, (x, y) is the position of the trigger event, Lt (x, y) is the brightness value of the pixel point at the moment t, Lt₀(x, y) is that the pixel point is at t₀Brightness value at time, t is the timestamp of the current trigger event, t₀Is the timestamp of the last trigger event.

Specifically, the data output by the event camera is in the form of a streaming output data stream, and is encoded in time sequence, and each trigger event is represented by a quadruple (x, y, t, m), where (x, y) is the position of the trigger event, t is the timestamp of the current trigger event, and m is the polarity of the brightness change.

In addition, in the above embodiment, when the absolute value of d is smaller than the set threshold, it indicates that no event is triggered, and m is set to 0;

when the absolute value of d is not less than a set threshold and d is greater than zero, a positive triggering event is represented, and m is set to be + 1;

when the absolute value of d is not less than the set threshold and d is less than zero, a negative event is triggered, and m is set to-1.

Further, in an embodiment of the present application, before step 110, the method further includes:

step 101, acquiring the number of trigger events in each sampling period of a plurality of continuous sampling periods in the detected scene, wherein the detected scene is a static scene;

step 102, calculating a variance of the number of trigger events in a plurality of sampling periods, and if the variance is smaller than a preset threshold, determining the preset environmental noise threshold according to a mean value of the number of trigger events in the plurality of sampling periods.

It should be noted that the noise amount of the event camera in the above embodiments is significantly greater than that of the conventional camera, including the noise caused by the intrinsic noise of the event camera and the light in the environment, and in order to reduce the influence of the environmental noise as much as possible and reduce the computation amount, before the program starts to run, the measured scene needs to be dynamically sampled at a fixed sampling period.

Specifically, dynamic sampling may be performed by:

wherein E is the number of all trigger events in a scene to be tested in a sampling period, delta t is the sampling period, (x, y) is the position of the trigger event, and t is the timestamp of the current trigger event;

further, to ensure that a moving object can be detected in a timely manner, a shorter sampling period is usually adopted, including but not limited to 1ms, 3ms, 5ms, and 10 ms.

Specifically, after sampling is completed, in order to ensure the accuracy of the environmental noise, the variance of the number of trigger events in a continuous sampling period is determined by the following formula:

wherein S is²Is the variance of the number of trigger events in successive sampling periods, n is the number of successive sampling periods, Ei represents the number of all trigger events in the scene under test in the ith sampling period,

the average value of the number of trigger events in the continuous sampling period;

further, the variance S of the number of trigger events in successive sampling periods²When the threshold value is smaller than the set threshold value, the environmental noise threshold value is determined by the following formula:

wherein T is an environmental noise threshold, n is the number of continuous sampling periods, Ei represents the number of all trigger events in a detected scene in the ith sampling period, and k is a proportionality coefficient;

further, in order to further reduce the influence of the environmental noise, the proportionality coefficient k in the embodiment of the present application may be set between 1.5 and 2.5, for example, k may be 1.5, 1.6, 1.65, 1.7, 1.75, 1.8, 1.85, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2, 5, etc., that is, the proportionality coefficient k may be set between the interval 1.5 and 2.5.

In this embodiment of the present application, before the processing the trigger events in the plurality of sampling periods by the DBSCAN clustering algorithm, the method includes:

In this embodiment of the present application, the processing, by using the DBSCAN clustering algorithm, the trigger events in the consecutive multiple sampling periods to determine the number of moving objects in the detected scene and the initial position triggered by each of the moving objects specifically includes:

Specifically, due to the characteristics of the event camera, most of trigger events are generated at an edge portion for a data stream in which a moving object occurs, the trigger events generated by the moving object are shown in fig. 3, different objects can be better divided by using DBSCAN clustering, and a schematic diagram of an event camera output data stream and a DBSCAN clustering result thereof is shown in fig. 4, where fig. 4(a) is a schematic diagram of the event camera output data stream, and fig. 4(b) is a schematic diagram of the event camera output data stream DBSCAN clustering result.

Further, because the noise generation of the event camera has independence, the generation of a single noise point is irrelevant to the peripheral trigger events, and the noise point can be better filtered by using the DBSCAN clustering algorithm.

In the embodiment of the present application, the category of any one of the trigger events is determined by the following formula:

Specifically, the category of any one of the trigger events is determined by the following procedure:

for the sample set D ═ { x1, x2, …, xn }, all points therein are labeled as unprocessed.

for each object p in the dataset D

if p has been classified as the n continuous

Calculating N (p, e)

if N (p, e) < minPts then marks the point as a noise point

Marking p as a core point by else, establishing a new cluster C, and putting p into the cluster C

S ═ set of all points q in the p neighborhood }

All elements t in for S

if t has been marked as a noise point then modified to an edge point

if t has been marked with the n continue

Classifying t as cluster C

if N(t,∈)≥minPts then

In this embodiment, the tracking a moving object according to the event image frame includes:

It should be noted that, the traditional optical flow operation only calculates the gray matching error of a single key point, and in the present application, all events triggered by a moving object are regarded as a whole, the whole optical flow error is calculated, and a higher weight is given to the core point obtained by the DBSCAN algorithm, so as to determine the moving direction and distance of the moving object.

In an embodiment of the present application, the determining a motion trajectory of the moving object according to the motion component of the centroid point includes: obtaining the motion component of the centroid point in the coordinate axis direction when the optical flow error is minimum by the following formula to determine the motion track of the moving object:

e(d)＝e(d_x,d_y)＝∑_(x,y)∈Cweight_(x,y)(I(x,y)-J(x+d_x,y+d_y))²

wherein e (d) is an optical flow error, C is a coordinate set of trigger events composed of DBSCAN clustering results of initial positions triggered by any moving object, (x, y) is a coordinate of any trigger event in C on a first event image frame, dx and dy are motion components of any trigger event in C in the coordinate axis direction after one or more preset sampling periods, (x + dx, y + dy) is a position of any trigger event in C on a second event image frame, I (x, y) is a pixel brightness value of any trigger event in C on the first event image frame, J (x + dx, y + dy) is a pixel brightness value of any trigger event in C on the second event image frame, and weight (x, y) is different weight coefficients corresponding to a core point and an edge point.

It should be noted that, for a detected moving object, it is determined that a center of mass point of the moving object is a center position of the moving object, and the center position is tracked, where moving object tracking is shown in fig. 5, fig. 5(a) is a schematic tracking diagram of a second frame, fig. 5(b) is a schematic tracking diagram of a third frame, and a line segment in the diagram is a tracking track.

Specifically, the event image frame is a binary image, and when the motion trajectory of the object is calculated according to the optical flow, the calculated optical flow error can better reflect the deviation condition of the motion.

In order to implement the above embodiments, the present application further provides a multi-target tracking device based on an event camera.

As shown in fig. 6, the event camera-based multi-object tracking apparatus includes: an acquisition module 610, a determination module 620, and a tracking module 630.

An obtaining module 610, configured to obtain the number of trigger events in each sampling period of multiple consecutive sampling periods in a scene to be tested;

a determining module 620, configured to, if the number of trigger events in multiple consecutive sampling periods all exceeds a preset ambient noise threshold, process the trigger event in the current latest sampling period through a DBSCAN clustering algorithm, so as to determine the number of moving objects in the detected scene and an initial position triggered by each of the moving objects;

and the tracking module 630 is configured to determine positions of all moving objects in the detected scene in the event image frame according to the data stream of the trigger event, and track the moving objects according to the event image frame.

It should be noted that the above explanation of the embodiment of the multi-target tracking method based on the event camera is also applicable to the multi-target tracking apparatus based on the event camera of the embodiment, and is not repeated herein.

In order to implement the foregoing embodiments, the present application further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method as described in the first embodiment of the present application is implemented.

It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present application, "a plurality" means two or more unless otherwise specified.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A multi-target tracking method based on an event camera is characterized by comprising the following steps:

2. The method of claim 1, wherein prior to acquiring the number of trigger events in each of the plurality of sampling periods in the scene under test, further comprising:

3. The method according to claim 1 or 2, wherein before said processing the triggering events within the plurality of sampling periods by the DBSCAN clustering algorithm, comprising:

4. The method according to claim 3, wherein the processing the trigger events in the consecutive sampling periods through the DBSCAN clustering algorithm to determine the number of moving objects in the detected scene and the initial position triggered by each of the moving objects comprises:

5. The method of claim 4, wherein the category of any trigger event is determined by:

6. The method of claim 4, further comprising:

7. The method of claim 6, wherein said tracking a moving object from said event image frames comprises:

8. The method of claim 7, wherein determining the motion trajectory of the moving object from the motion components of the centroid points comprises: obtaining the motion component of the centroid point in the coordinate axis direction when the optical flow error is minimum by the following formula to determine the motion track of the moving object:

e(d)＝e(d_x,d_y)＝∑_(x,y)∈Cweight_(x,y)(I(x,y)-J(x+d_x,y+d_y))²

9. An event camera based multi-target tracking apparatus, the apparatus comprising:

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1-8 when executing the computer program.