CN106971401B

CN106971401B - Multi-target tracking device and method

Info

Publication number: CN106971401B
Application number: CN201710203912.2A
Authority: CN
Inventors: 邹李兵
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2017-03-30
Filing date: 2017-03-30
Publication date: 2020-09-25
Anticipated expiration: 2037-03-30
Also published as: CN106971401A

Abstract

The application provides a multi-target tracking device and a multi-target tracking method. The multi-target tracking apparatus includes: a tracking module configured to receive video frame data and acquire indication data indicating a plurality of objects to be tracked, and track the plurality of objects in each frame of the video frame data according to the indication data of the plurality of objects so as to generate a tracking result; a detection module configured to receive the video frame data and detect the plurality of objects in each frame of the video frame data to generate a detection result; the learning module is configured to receive the video frame data, update the detection module according to the tracking result of the tracking module, and update the tracking module according to the detection result of the detection module; and an integration module configured to receive a tracking result of the tracking module and a detection result of the detection module to generate indication data indicating a plurality of targets being tracked.

Description

Multi-target tracking device and method

Technical Field

The application relates to a multi-target tracking device and method.

Background

TLD (Tracking-Learning-Detection) is a new single target long time (long Tracking) Tracking algorithm. The TLD is remarkably different from the traditional tracking method in that the traditional tracking algorithm and the traditional detection algorithm are combined to solve the problems of deformation, partial shielding and the like of the tracked target in the tracking process, and the tracking effect is more stable, robust and reliable. However, the original TLD is a single target tracking method, and is not applicable in a multi-target tracking application scenario.

For this reason, it is desirable to provide a multi-target tracking apparatus and method capable of tracking a plurality of targets.

Disclosure of Invention

According to an embodiment of the present application, there is provided a multi-target tracking apparatus including:

a tracking module configured to receive video frame data and acquire indication data indicating a plurality of objects to be tracked, and track the plurality of objects in each frame of the video frame data according to the indication data of the plurality of objects so as to generate a tracking result;

a detection module configured to receive the video frame data and detect the plurality of objects in each frame of the video frame data to generate a detection result;

the learning module is configured to receive the video frame data, update the detection module according to the tracking result of the tracking module, and update the tracking module according to the detection result of the detection module; and

an integration module configured to receive a tracking result of the tracking module and a detection result of the detection module to generate indication data indicating a plurality of targets being tracked.

Optionally, the tracking module comprises a multi-target tracking manager configured to:

performing addition of a tracking target according to a predetermined operation;

performing an update of a plurality of tracking targets; and

and managing the bidirectional mapping relation between the first queues of the tracking target frames of the plurality of tracking targets and the second queues of the target area characteristic points of the plurality of tracking targets.

Optionally, the multi-target tracking manager is further configured to:

tracking all the feature points in the second queue of the feature points of the target area by using an optical flow method so as to generate a tracking result;

determining whether the tracking target frame of each of the plurality of tracking targets succeeds in tracking according to the tracking result and the reverse mapping relation between the characteristic points and the tracking target frame;

if the result is successful, recalculating the tracking target frame and updating the feature points, otherwise deleting the tracking target frame of the tracking target and the corresponding feature points from the first queue and the second queue respectively.

Optionally, the learning module comprises:

a sample queue generator configured to generate a positive sample queue and a negative sample queue, wherein the positive sample queue includes a tracking target frame of each tracking target tracked by the tracking module or a detection target frame of each tracking target detected by the detection module in each frame of the video frame data, the negative sample queue is a common negative sample stored, and the negative sample is an area not intersecting the positive sample within a predetermined range near the positive sample, wherein, for each of the plurality of tracking targets, the positive sample queue includes the tracking target frame and the detection target frame of the tracking target, and the negative sample queue includes the common negative sample and positive samples belonging to the positive sample queues of the other tracking targets; and

the classifier is configured to perform similarity comparison on the positive sample queue and the negative sample queue of each tracking target respectively; acquiring the number and the attribution of the positive sample queue and the negative sample queue according to a set threshold value, and executing normalization processing; and calculating probabilities of positive and negative samples belonging to each tracked object.

Optionally, the detection module is further configured to:

for each frame of the video frame data, sliding a detection box to detect a negative sample;

determining the area in which the negative sample is detected as a subsequent non-detection area; and

for the remaining regions, positive sample detection is performed to determine a tracking target.

According to another embodiment of the present application, there is provided a multi-target tracking method including:

receiving, by a tracking module, video frame data and acquiring indication data indicating a plurality of objects to be tracked, and tracking the plurality of objects in each frame of the video frame data according to the indication data of the plurality of objects to generate a tracking result;

receiving, by a detection module, the video frame data and detecting the plurality of objects in each frame of the video frame data to generate a detection result;

receiving the video frame data through a learning module, updating the detection module according to the tracking result of the tracking module, and updating the tracking module according to the detection result of the detection module; and

receiving, by an integration module, a tracking result of the tracking module and a detection result of the detection module to generate indication data indicating a plurality of targets being tracked.

updating a plurality of tracking targets according to a preset operation; and

Optionally, the multi-target tracking manager is further configured to:

Optionally, the learning module comprises:

Optionally, the detection module is further configured to:

Therefore, by using the multi-target tracking device and method according to the embodiment of the application, a plurality of targets can be tracked.

Drawings

FIG. 1 is a block diagram illustrating a prior art TLD algorithm;

FIG. 2 is a block diagram illustrating a functional configuration of a multi-target tracking device according to an embodiment of the present application;

FIG. 3 is a diagram illustrating a mapping relationship of a trace target box queue and a feature point queue;

FIG. 4 is a schematic diagram illustrating a positive sample queue and a negative sample queue; and

FIG. 5 is a flow diagram illustrating a multi-target tracking method according to an embodiment of the application.

Detailed Description

Before describing the multi-target tracking apparatus and the multi-target tracking method according to the embodiment of the present application, the TLD algorithm is first briefly introduced.

As is well known to those skilled in the art, the TLD algorithm is a long-term, online learning, single-target tracking method with minimal a priori information. The TLD algorithm consists essentially of three parts: tracker, detector and learner. The tracker part is composed of a short-period self-adaptive tracker and is used for predicting the movement of the target between continuous frames under the condition that the movement between the frames is limited and the target is visible. The detector part is an efficient cascade classifier, simple and effective image features are created and used, and the target can be detected in real time, and meanwhile, the tracker is corrected under the necessary condition. The learner evaluates the performance of the tracker and detector and completes the updating of the detector by generating valid training samples, eliminating detector errors.

The framework of the TLD algorithm is shown in fig. 1. In the initial frame of tracking the target, the initialization of the TLD algorithm is completed by giving the position and the size of the target. In the subsequent tracking process, each frame image is processed in parallel by the tracker and the detector together.

Specifically, the tracker 101 estimates the position of the target in the current frame according to the position information of the target in the previous frame, the detector 103 globally scans the window of the current frame to detect one or more possible target positions, and the detection result and the tracking result are input to the synthesizer 104. The integrator 104 gives information on whether the current frame has a target, a target position, and whether a tracking track to the current frame is valid. The results of these integration processes, the joint detection results, and the tracking results are input to the learner 102, which completes the updating of the tracker and detector.

However, one drawback of existing TLD algorithms is that only a single target can be tracked. Therefore, the method and the device improve the existing TLD algorithm, and realize multi-target tracking by respectively improving the tracker, the detector and the learner of the existing TLD algorithm.

Hereinafter, a multi-target tracking apparatus according to a first embodiment of the present application will be described in detail with reference to the accompanying drawings. As shown in fig. 2, the multi-target tracking apparatus 200 according to the first embodiment of the present application includes:

a tracking module 201 configured to receive video frame data and acquire indication data indicating a plurality of objects to be tracked, and track the plurality of objects in each frame of the video frame data according to the indication data of the plurality of objects so as to generate a tracking result;

a detection module 202 configured to receive the video frame data and detect the plurality of objects in each frame of the video frame data to generate a detection result;

a learning module 203 configured to receive the video frame data, update the detection module according to the tracking result of the tracking module, and update the tracking module according to the detection result of the detection module; and

an integration module 204 configured to receive the tracking result of the tracking module and the detection result of the detection module to generate indication data indicating the tracked plurality of targets.

Unlike the existing TLD, the tracking module 201 according to the embodiment of the present application acquires indication data indicating a plurality of objects to be tracked in addition to receiving video frame data.

The indication data may be a tracking box indicating the position and the size of the target, or may be a mark for identifying whether the tracked target is visible.

In one embodiment, for each of the plurality of tracking targets, a rectangle containing the tracking target may be manually determined in the first frame image, and initial coordinates and width and height information of the rectangle may be obtained as the indication data.

In another embodiment, the information of the object to be tracked may be stored in advance, the object to be tracked may be identified in the first frame image by means of image identification, a rectangle containing the tracking target may be generated, and the initial coordinates and the width and height information of the rectangle may be obtained as the indication data.

The tracking module 201 may also include a multi-target tracking manager 2011.

The multi-target tracking manager 2011 is configured to: performing addition of a tracking target according to a predetermined operation; performing an update of a plurality of tracking targets; and managing a bidirectional mapping relation between a first queue of the tracking target frames of the plurality of tracking targets and a second queue of the target area feature points of the plurality of tracking targets.

Specifically, for example, during tracking, by manually determining a rectangle containing a tracking target in the current frame image and obtaining initial coordinates and width and height information of the rectangle as indicating data, the multi-target tracking manager 2011 may add the manually determined tracking target as a tracking target to be tracked next.

Alternatively, the multi-target tracking manager 2011 may add a tracking target to be tracked next by transmitting information of the tracking target to be tracked to the tracker so that a rectangle containing the tracking target is generated in the current frame image during tracking and initial coordinates and width and height information of the rectangle are obtained as indicating data.

In addition, the multi-target tracking manager 2011 may perform updates of multiple tracking targets. The updating includes recalculating the tracking frames of the tracking targets and the corresponding feature points.

The multi-target tracking manager 2011 may further manage a bidirectional mapping relationship between a first queue of the tracking target frames of the plurality of tracking targets and a second queue of the target area feature points of the plurality of tracking targets.

The bidirectional mapping relationship will be described in detail below with reference to fig. 3.

As shown in fig. 3, for a plurality of tracked targets to be tracked (N is assumed, and N is an integer greater than 1), a tracking frame is generated for each tracked target, respectively. The trace frames of the plurality of trace targets form a first queue, i.e., obj1, obj2, … …, obj n in fig. 3.

The feature points are a number of pixel points selected in a tracking frame of the video data frame. As shown in fig. 3, each of the tracking frames obj1, obj2, … …, objN includes a plurality of pixel points as feature points. And each tracking frame and the corresponding characteristic point have a bidirectional mapping relation. By such a bidirectional mapping relationship, the corresponding tracking frame and feature point can be determined.

In addition, the multi-target tracking manager 2011 is further configured to:

In addition, the learning module 203 includes a sample queue generator 2031 and a classifier 2032.

The sample queue generator 2031 is configured to generate a positive sample queue and a negative sample queue. The positive sample queue includes a tracking target frame of each tracking target tracked by the tracking module 201 or a detection target frame of each tracking target detected by the detection module 202 in each frame of the video frame data.

Specifically, in each frame, when the confidence of the tracking target frame of the tracking target tracked by the tracking module 201 is high, the tracking frame tracked by the tracking module 201 may be selected as a positive sample. On the other hand, when the confidence of the tracking target frame of the tracking target tracked by the tracking module 201 is low and the confidence of the object frame detected by the detection module 202 is high, the detection frame detected by the detection module 202 may be selected as the positive sample. On the other hand, when the confidence of the tracking target frame of the tracking target tracked by the tracking module 201 is low and the confidence of the object frame detected by the detection module 202 is low, no positive sample is selected in the frame.

Thus, as shown in fig. 4, for obj1, obj2, … …, obj n, by the current frame, the positive samples of tracked object 1 include P11, P12, P13, P14, P15. The positive samples of the tracked object 2 include P21, P22, P23. … … are provided. Positive samples for trace object N include Pn1, Pn 2.

On the other hand, as shown in FIG. 4, the negative examples queue is a common negative example of storage. The negative sample is a region not intersecting the positive sample within a predetermined range around the positive sample. For example, for the positive sample P11 of the tracked object 1, a region of, for example, 10 pixels in size around the positive sample P11 may be taken as a negative sample. Note that the predetermined range may be freely set as needed.

Further, it is to be noted that, in the present embodiment, for each of a plurality of trace targets, the positive sample queue includes the trace target frame or the detection target frame of the trace target obtained in the above-described manner. On the other hand, for each of the plurality of trace targets, the negative sample queue includes, in addition to the common negative sample, positive samples belonging to positive sample queues of other trace targets. That is, for a particular tracked object, the positive samples of the other tracked objects are also negative samples for it.

In addition, the classifier 2032 may employ a multi-class K-nearest neighbor (KNN, K-nearest neighbor) classifier, for example. The KNN classifier is configured to perform similarity comparison on the positive sample queue and the negative sample queue of each tracking target respectively. Then, according to a set threshold, the number and the attribution of the positive sample queue and the negative sample queue are obtained, and normalization processing is performed to calculate the probability of the positive samples and the negative samples belonging to the tracking object objN, wherein the negative samples comprise the negative samples belonging to the negative sample queue and all the positive samples not belonging to the objN in the positive sample queue.

The detection module 203 is further configured to, for each frame of the video frame data, slide a detection box to detect a negative sample. Unlike the TLD algorithm of the prior art, because the sample queue generator 2031 has already generated the negative sample queue, the detection module 203 can reduce the area to be detected by filtering the area where the negative sample exists by detecting the negative sample first.

For example, if the detection box determines that the similarity of the negative examples is > 70%, the region may be determined to be free from the tracking object, i.e., the region in which the negative examples are detected is determined to be a subsequent non-detection region.

Finally, for the remaining area of the video frame, positive sample detection is performed to determine a tracking target. Thus, compared to a manner in which a plurality of detectors are simply provided for each object, the number of detectors in the embodiment of the present application is not increased, that is, there is only one detector. In this way, the computing and storage resources of the tracking system will be greatly saved.

On the other hand, by filtering the negative sample regions first, the detection region of the image frame can be greatly reduced, and thus the performance of the detector is improved.

As with the prior art TLD algorithm, the integration module 204 receives the tracking results of the tracking module and the detection results of the detection module to generate indication data indicative of the plurality of targets being tracked.

Specifically, the integration module 204 integrates the target frames obtained by the detector and the tracker and provides the integrated target frames as the final output of the multi-target tracking device. If neither the tracker nor the detector obtains the target frame, the integration module 204 determines that the tracked target is not present in the current frame. Otherwise, the integration module 204 takes the target box with the greatest conservative similarity as the final target box position.

Therefore, by using the multi-target tracking device according to the embodiment of the application, a plurality of targets can be tracked.

< second embodiment >

Next, a multi-target tracking method according to a second embodiment of the present application will be described with reference to fig. 5.

As shown in fig. 5, the multi-target tracking method 500 according to the present embodiment includes:

step S501: receiving, by a tracking module, video frame data and acquiring indication data indicating a plurality of objects to be tracked, and tracking the plurality of objects in each frame of the video frame data according to the indication data of the plurality of objects to generate a tracking result;

step S502: receiving, by a detection module, the video frame data and detecting the plurality of objects in each frame of the video frame data to generate a detection result;

step S503: receiving the video frame data through a learning module, updating the detection module according to the tracking result of the tracking module, and updating the tracking module according to the detection result of the detection module; and

step S504: receiving, by an integration module, a tracking result of the tracking module and a detection result of the detection module to generate indication data indicating a plurality of targets being tracked.

The multi-target tracking method 500 according to the present embodiment can be implemented by the multi-target tracking apparatus 200 according to the first embodiment. Accordingly, detailed descriptions of the respective modules for performing the respective steps of the multi-target tracking method are omitted herein.

updating a plurality of tracking targets according to a preset operation; and

Optionally, the multi-target tracking manager is further configured to:

Optionally, the learning module comprises:

Optionally, the detection module is further configured to:

Therefore, by using the multi-target tracking method according to the embodiment of the application, a plurality of targets can be tracked.

It is to be noted that the above embodiments are merely examples, and the present invention is not limited to such examples, but may be variously modified.

It should be noted that, in the present specification, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Finally, it should be noted that the series of processes described above includes not only processes performed in time series in the order described herein, but also processes performed in parallel or individually, rather than in time series.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus a necessary hardware platform, and may also be implemented by hardware entirely. With this understanding, all or part of the technical solutions of the present invention that contribute to the background can be embodied in the form of a software product, which can be stored in a storage medium, such as a ROM (read only memory)/RAM (random access memory), a magnetic disk, an optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to each embodiment or some parts of the embodiments of the present invention.

The present invention has been described in detail, and the principle and embodiments of the present invention are explained herein by using specific examples, which are only used to help understand the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A multi-target tracking apparatus, comprising:

an integration module configured to receive a tracking result of the tracking module and a detection result of the detection module to generate indication data indicating a plurality of targets being tracked,

the learning module includes:

a sample queue generator configured to generate a positive sample queue and a negative sample queue, respectively, for each of a plurality of tracked targets, wherein the negative sample queue comprises stored common negative samples and positive samples of positive sample queues belonging to other tracked targets.

2. The apparatus of claim 1, wherein the tracking module comprises a multi-target tracking manager configured to:

performing an update of a plurality of tracking targets; and

3. The apparatus of claim 2, wherein the multi-target tracking manager is further configured to:

4. The apparatus of claim 3, wherein the positive sample queue includes a tracking target frame of each tracking target tracked by the tracking module or a detection target frame of each tracking target detected by the detection module in each frame of the video frame data, and the negative sample is a region within a predetermined range around the positive sample that does not intersect the positive sample, wherein the positive sample queue includes a tracking target frame and a detection target frame of the tracking target for each of the plurality of tracking targets,

the learning module further comprises:

5. The apparatus of claim 4, wherein the detection module is further configured to:

6. A multi-target tracking method comprises the following steps:

receiving, by an integration module, a tracking result of the tracking module and a detection result of the detection module to generate indication data indicating a plurality of targets being tracked,

the learning module includes:

7. The method of claim 6, wherein the tracking module comprises a multi-target tracking manager configured to:

updating a plurality of tracking targets according to a preset operation; and

8. The method of claim 7, wherein the multi-target tracking manager is further configured to:

9. The method of claim 8, wherein the positive sample queue includes a tracking target frame of each tracking target tracked by the tracking module or a detection target frame of each tracking target detected by the detection module in each frame of the video frame data, and the negative sample is a region within a predetermined range around the positive sample that does not intersect the positive sample, wherein the positive sample queue includes the tracking target frame and the detection target frame of the tracking target for each of the plurality of tracking targets,

the learning module further comprises:

10. The method of claim 9, wherein the detection module is further configured to: