CN108470332B

CN108470332B - Multi-target tracking method and device

Info

Publication number: CN108470332B
Application number: CN201810069852.4A
Authority: CN
Inventors: 吴婷璇; 陈杰
Original assignee: Boyun Vision Beijing Technology Co ltd
Current assignee: Boyun Vision Beijing Technology Co ltd
Priority date: 2018-01-24
Filing date: 2018-01-24
Publication date: 2023-07-07
Anticipated expiration: 2038-01-24
Also published as: CN108470332A

Abstract

The invention provides a multi-target tracking method and a device, wherein the multi-target tracking method comprises the following steps: acquiring candidate detection frames of a plurality of tracking targets in a j-th frame through target detection; associating candidate detection frames of a plurality of tracking targets in a j-th frame with the ROI of the plurality of tracking targets in a j-1-th frame to obtain detection frames of the plurality of tracking targets respectively corresponding to the j-th frame; determining that at least two tracking targets of the plurality of tracking targets overlap between the detection frames associated with the ith frame and cancel the overlap between the detection frames associated with the jth frame; reclassifying the detection frames associated with the at least two tracking targets in the j frame by using the classification recognition model to obtain detection frames of the at least two tracking targets reclassifying in the j frame. The invention can ensure that the overlapped tracking targets in the tracking process are correctly matched with the positions of the overlapped tracking targets when the overlapping is canceled, thereby ensuring the accuracy of multi-target tracking.

Description

Multi-target tracking method and device

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a multi-target tracking method and a target tracking apparatus.

Background

Target tracking includes single target tracking and multi-target tracking. Single-target tracking can be performed through apparent modeling or motion modeling of targets to address problems of illumination, deformation, shielding and the like. The multi-target tracking problem is much more complex and requires correlation matching between targets in addition to the problems encountered with single target tracking.

The multi-target tracking technology belongs to a research hot spot in the field of computer vision. Multi-object tracking refers to determining, with a computer, the position, size, and complete motion trajectory of each individual moving object of interest, having some significant visual characteristic, in a video sequence. The method has very wide application in the fields of vehicle-mounted auxiliary systems, military and intelligent security. The multi-target tracking technique generally employs the following two schemes:

1. firstly, detecting targets, then, carrying out feature description on each detected target, and then, tracking each target according to the features.

2. For long-time tracking or tracking under the condition that the tracked target has shape change, a detection method is adopted by many people to replace tracking, and detection targets obtained by continuous frames are associated, so that a complete track of the target is obtained.

Overlapping problems of objects are often encountered in multi-object tracking tasks. After the tracking target overlaps with other targets, a target tracking track matching error may occur.

Disclosure of Invention

In view of the above, the present invention provides a multi-target tracking method and apparatus, which are aimed at solving the problem of matching errors of target tracking tracks in multi-target tracking.

In a first aspect, an embodiment of the present invention provides a multi-target tracking method, including:

acquiring candidate detection frames of a plurality of tracking targets in a j-th frame through target detection;

associating candidate detection frames of a plurality of tracking targets in a j-th frame with the ROI of the plurality of tracking targets in a j-1-th frame to obtain detection frames of the plurality of tracking targets respectively corresponding to the j-th frame;

determining that at least two tracking targets of the plurality of tracking targets overlap between the detection frames associated with the ith frame and cancel the overlap between the detection frames associated with the jth frame;

reclassifying the detection frames associated with the at least two tracking targets in the j-th frame by using a classification recognition model to obtain detection frames of the at least two tracking targets reclassifying in the j-th frame, so that the detection frames of the at least two tracking targets reclassifying in the j-th frame are associated with the ROI of the at least two tracking targets in the i-1-th frame, wherein i and j are positive integers and i < j.

In one embodiment, the multi-target tracking method of the first aspect further comprises:

establishing a classification recognition model of a plurality of tracking targets;

updating the classification recognition model using the ROI of the frame before the at least two tracking objects overlap when it is determined that the at least two tracking objects overlap between the detection frames associated with the ith frame,

wherein reclassifying the detection frames associated with the at least two tracking targets at the j-th frame using the classification recognition model, comprising:

and reclassifying the detection frames associated with the at least two tracking targets in the j-th frame by using the updated classification recognition model.

In one embodiment, the multi-target tracking method of the first aspect, when determining that at least two tracking targets of the plurality of tracking targets overlap between detection frames associated with the ith frame and cancel the overlap between detection frames associated with the jth frame, includes:

calculating the intersection ratio IOU of a plurality of tracking targets among the detection frames associated with the j-th frame;

if the IOU of the plurality of tracking targets between the detection frames associated with the j-th frame is greater than a specific threshold, at least two detection frames of the tracking targets overlap,

if the IOU of the plurality of tracking targets between the detection frames associated with the j-th frame is smaller than or equal to a specific threshold value, determining that the detection frames of at least two tracking targets do not overlap.

In one embodiment, the multi-target tracking method of the first aspect further includes associating a candidate detection frame of the plurality of tracking targets at the j-th frame with a region of interest ROI of the plurality of tracking targets at the j-1 th frame:

calculating the intersection ratio IOU of the ROIs determined by the tracking targets in the j-1 frame and the candidate detection frames of the tracking targets in the j frame, and taking the candidate detection frame with the largest IOU value of each tracking target in the tracking targets and larger than a specific threshold value as the detection frame of the tracking target in the j frame.

In one embodiment, in the multi-target tracking method of the first aspect, the method further includes candidate detection box processing for a plurality of tracking targets:

after each tracking target in the plurality of tracking targets is associated with the candidate detection frame of the j-th frame, deleting the associated detection frame from the candidate detection frame queue;

for the tracking target which is not successfully associated, if a plurality of continuous frames are not successfully associated, deleting the tracking target from the tracking target queue;

and for the candidate detection frames which are not successfully associated, if the candidate detection frames continuously appear in a plurality of continuous frames, taking the candidate detection frames as new tracking targets, and adding the new tracking targets into a tracking target queue.

In a second aspect, there is provided a multi-target tracking apparatus, the apparatus comprising:

The acquisition module is used for acquiring candidate detection frames of a plurality of tracking targets in a j-th frame through target detection;

the association module is used for associating the candidate detection frames of the tracking targets in the j frame with the interested region ROI of the tracking targets in the j-1 frame to obtain detection frames of the tracking targets corresponding to the tracking targets in the j frame respectively;

a determining module for determining that at least two tracking targets among the plurality of tracking targets overlap between the detection frames associated with the ith frame and cancel the overlap between the detection frames associated with the jth frame;

and the classification module is used for reclassifying the detection frames of the at least two tracking targets in the j frame by using the classification recognition model to obtain the detection frames of the at least two tracking targets after reclassifying in the j frame, so that the detection frames of the at least two tracking targets after reclassifying in the j frame are associated with the ROIs of the at least two tracking targets in the i-1 frame, wherein i and j are positive integers and i < j.

In one embodiment, the multi-target tracking apparatus of the second aspect further includes:

the building module is used for building a classification recognition model of a plurality of tracking targets;

and the updating module is used for updating the classification recognition model by utilizing the ROI of the frame before the at least two tracking targets are overlapped when the at least two tracking targets are overlapped between the detection frames related before the ith frame, wherein the classification module reclassifies the detection frames related to the at least two tracking targets in the ith frame by utilizing the updated classification recognition model.

In one embodiment, in the multi-target tracking apparatus of the second aspect, the determining module is specifically configured to:

determining that if the IOU of the plurality of tracking targets between the detection frames associated with the j-th frame is greater than a specific threshold, at least two detection frames of the plurality of tracking targets overlap; if the IOU of the plurality of tracking targets between the detection frames associated with the j-th frame is less than or equal to a specific threshold, determining that the detection frames of at least two plurality of tracking targets do not overlap.

In one embodiment, in the multi-target tracking device of the second aspect, the association module is specifically configured to:

associating the candidate detection frames of the tracking targets in the j-th frame with the interesting region ROI of the tracking targets in the j-1 th frame;

In one embodiment, in the multi-target tracking device of the second aspect, the association module is further configured to:

Yet another aspect of the invention provides a computer-readable storage medium having stored thereon computer-executable instructions, wherein the executable instructions when executed by a processor implement a method as described above.

Yet another aspect of the present invention provides a computer apparatus comprising: a memory, a processor, and executable instructions stored in the memory and executable in the processor, wherein the processor, when executing the executable instructions, implements the method as described above.

In summary, the embodiment of the present invention provides a multi-target tracking method, which obtains candidate detection frames of a plurality of tracking targets in a j-th frame through target detection; associating candidate detection frames of a plurality of tracking targets in a j-th frame with the ROI of the plurality of tracking targets in a j-1-th frame to obtain detection frames of the plurality of tracking targets respectively corresponding to the j-th frame; determining that at least two tracking targets of the plurality of tracking targets overlap between the detection frames associated with the ith frame and cancel the overlap between the detection frames associated with the jth frame; reclassifying the detection frames associated with the at least two tracking targets in the j-th frame by using a classification recognition model to obtain detection frames of the at least two tracking targets reclassifying in the j-th frame, so that the detection frames of the at least two tracking targets reclassifying in the j-th frame are associated with the ROI of the at least two tracking targets in the i-1-th frame, wherein i and j are positive integers and i < j. The invention can ensure that the overlapped tracking targets in the tracking process are correctly matched with the positions of the overlapped tracking targets when the overlapping is canceled, thereby ensuring the accuracy of multi-target tracking.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.

Drawings

For a clearer description of the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the description below are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art, wherein:

FIG. 1 is a schematic flow chart of one embodiment of the invention for when tracking object overlap occurs in multi-object tracking;

FIG. 2 is a schematic flow diagram for multi-target tracking in accordance with an embodiment of the invention;

FIG. 3 is a schematic flow chart of fine tuning of a classification model when tracking object overlap occurs for use in multi-object tracking in accordance with an embodiment of the invention.

Fig. 4 is a block diagram of a multi-target tracking apparatus 400 according to an exemplary embodiment of the invention.

FIG. 5 is a block diagram of a computer device for multi-target tracking, according to an exemplary embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

FIG. 1 is a schematic flow chart of a multi-target tracking method according to an embodiment of the invention. The method of fig. 1 may be performed by a computing device, e.g., a server. The target tracking method of fig. 1 includes the following.

110, obtaining a plurality of candidate detection frames of the tracking target in the j frame through target detection.

In one image (e.g., each frame of image in a video), a closed region that is distinct from the surrounding environment is often referred to as a target. The process of giving the position of the object in the image is called detection. For example, the location of multiple tracked objects in the current video frame and their category information may be detected using an already trained object detection network or model.

For example, the candidate detection frames may be acquired by: labeling an application scene or a plurality of pictures close to the application scene, and training a target detector based on deep learning; a candidate detection frame of a plurality of tracking targets in a j-th frame is acquired by using a target detector.

120, associating the candidate detection frames of the tracking targets in the j-th frame obtained in the step 110 with the interested region ROI of the tracking targets in the j-1-th frame to obtain detection frames of the tracking targets corresponding to the j-th frame respectively.

In machine vision or image processing, a region to be processed, called a region of interest (Region of Interest, ROI), is outlined from the processed image in the form of a box, circle, ellipse, irregular polygon, or the like. Because the area of the region of interest is smaller, the processing time can be reduced and the accuracy can be increased. In an embodiment of the present invention, the ROI is selected as a rectangular box for illustration.

Correlation, also known as data correlation, is a typical processing method often used in multi-target tracking tasks to solve the problem of matching between targets. For example, data association may be performed using an overlap-and-go ratio (IOU), embodiments of the present invention are not limited thereto, and other methods may be used for association, such as probability data association, joint probability data association, and multi-hypothesis tracking algorithms.

Tracking is to determine the position (ROI) of the object in a certain frame, obtain relevant information of the object, such as color features, gradient features, etc., and then search for a specific position of the object in a subsequent frame.

130, determining that at least two tracking targets of the plurality of tracking targets overlap between the detection frames associated with the j-th frame and cancel the overlap between the detection frames associated with the j-th frame.

Specifically, it can be determined by calculating an overlap ratio (IOU) that a plurality of tracking targets overlap between detection frames associated with the j-th frame. The cross-over ratio is calculated between two frames of the j-th frame and the detection frames associated with the plurality of tracking objects. For example, it may be calculated whether or not the detection frame associated with each tracking object overlaps with the detection frame associated with other tracking objects in the j-th frame. For example, if the intersection ratio between the two tracking targets is greater than 0.3, it is indicated that the two tracking targets overlap each other among the plurality of tracking targets. For another example, if the calculated overlap ratio between the tracking target other than the overlapping tracking target and any one of the two overlapping tracking targets among the plurality of tracking targets is greater than 0.3, it is indicated that the two tracking targets overlap each other. Similarly, the tracking target that overlaps may be plural. For another example, when the calculated intersection ratio of the two tracking targets is equal to or less than 0.3, it is indicated that the two tracking targets do not overlap each other. Here, the overlap may also be referred to as occlusion. It should be understood that the method of determining whether to overlap is not limited to the overlap ratio, and may be implemented by other methods of determining that the tracking target overlaps.

In the embodiment of the present invention, according to the judgment method as described above, at least two tracking targets overlap at the i-th frame, and overlap is canceled at the j-th frame later.

140, reclassifying the j-th frame associated detection frame of the previous at least two overlapped tracking targets at the moment of canceling the overlap by using the classification recognition model to obtain the j-th frame reclassifying detection frame of the at least two tracking targets, so that the j-th frame reclassifying detection frame of the at least two overlapped tracking targets is associated with the i-1-th ROI of the at least two tracking targets.

Where i, j is a positive integer, and i < j.

Specifically, target detection is firstly performed for each frame in a video to determine candidate detection frames of a plurality of tracking targets, and the candidate detection frames of the tracking targets in the current frame are associated with the ROIs of the tracking targets in the previous frame to obtain detection frames respectively corresponding to the tracking targets in the current frame. Further, if there are a plurality of tracking targets overlapping in the current frame and the overlapping is cancelled in a certain frame, the overlapping tracking target association detection can be reclassified by using the classification recognition model in the frame where the overlapping is cancelled, so as to obtain a reclassified detection frame, so that the reclassified detection frame is associated with the ROI of the frame which is the frame where the reclassified detection frame is located when the overlapping occurs.

After the tracking targets overlap each other, the situation that the track information of the targets is wrong corresponding to the data association is possible only according to the ROI information of the previous and subsequent frames.

In the embodiment of the invention, according to the judging method, at least two tracking targets cancel overlapping in the j frame, and the detection frames associated with the tracking targets which are overlapped after canceling the overlapping can be reclassified by using the classification recognition model, so that the detection frames which are correctly associated with the at least two tracking targets which are overlapped are obtained after the overlapping, thereby ensuring the tracking accuracy. In addition, because a model is not required to be built for each tracking target to track, the calculation complexity is reduced, and the real-time performance of tracking is ensured.

Alternatively, the tracking targets that overlap each other may not be judged whether the matching is correct or not in the whole overlapping process.

Optionally, as another embodiment, the method of fig. 1 further includes: establishing a classification recognition model of a plurality of tracking targets; when it is determined that at least two tracking objects overlap between the detection frames associated with the ith frame, the classification recognition model is updated with ROIs of frames of the at least two tracking objects prior to when the overlap occurred. Wherein at 140, the detection frames associated with the at least two tracking targets at the j-th frame may be reclassified using the updated classification recognition model.

The classification recognition model may be built and trained as follows: and marking the application scene or a plurality of videos close to the application scene by adopting a plurality of frames at intervals, and training a classification recognition model based on deep learning. Further, the classification recognition model may be retrained using the ROIs of the tracking targets before overlapping each other, and the detection frames of the overlapping tracking targets may be reclassified using the retrained classification recognition model.

Based on the embodiment of the invention, in the multi-target tracking process, under the condition that tracking targets are overlapped with each other, the pre-trained classification recognition model can be finely adjusted. For example, retraining the classification recognition model with tracking target ROIs before overlapping each other occurs; after the tracking targets are cancelled from overlapping, the detection frames of the tracking targets can be reclassified by using the finely tuned classification recognition model. Because the classification recognition model is retrained by utilizing the tracking target ROIs which are overlapped with each other, the accuracy of the classification recognition model can be improved, thereby further ensuring the accuracy and the real-time performance of the tracking track of the multi-target tracking.

Optionally, as another embodiment, the method of fig. 1 further includes: the number of output nodes of the last fully connected layer of the convolutional network of the classification recognition model is modified to be equal to the number of at least two tracking targets.

In other words, there are N trace objects overlapping, and the number of output nodes of the last fully connected layer is changed to N. Because the updated classification recognition model only needs to classify at least two tracking targets but not all tracking targets, the speed of classifying the classification recognition model can be improved by correspondingly reducing the number of output nodes of the full-connection layer.

Optionally, as another embodiment, the method of fig. 1 further includes: determining that at least two tracking targets of the plurality of tracking targets overlap between detection frames associated with an ith frame and that overlap is canceled between detection frames associated with a jth frame, comprising:

if the IOU of the plurality of tracking targets among the detection frames associated with the j-th frame is greater than a specific threshold, overlapping the detection frames of at least two tracking targets;

Specifically, the cross-over ratio (IOU) of the candidate frame is detected, and the calculation formula is as follows:

wherein, BOX1, BOX2 represents 1,2 two tracking frames respectively, the numerator represents the intersection of the two, and the denominator represents the union of the two.

Determining the overlap requires calculating an overlap ratio (IOU), and when the overlap ratio is greater than 0.3, it is indicated that at least two tracking targets among the plurality of tracking targets overlap each other.

The cross-correlation ratio is a calculation performed between the current frame and the detection frames associated with all the tracking objects. In step 130, which is described in more detail, is not described in any greater detail herein.

Optionally, as another embodiment, the method of fig. 1 further includes: associating the candidate detection frames of the plurality of tracking targets at the j-th frame with the ROI of the plurality of tracking targets at the j-1-th frame, comprising:

Specifically, the specific operation of data association is to calculate the intersection ratio of the ROI determined by each tracking target in the j-1 frame and all the detection candidate frames in the j frame, and the concept and calculation method of the intersection ratio are described in the foregoing, and will not be described in any more detail herein. The cross ratio for data association is a calculation performed between the detection frames associated with the plurality of tracking objects in the j-th frame. For example, it may be calculated whether or not the detection frame associated with each tracking object in the j-th frame overlaps with the detection frames associated with other tracking objects. It should be understood that the method of determining whether to overlap is not limited to the overlap ratio, and may be implemented by other methods of determining that the tracking target overlaps.

According to an embodiment of the invention, the cross-correlation data for association is recorded using a matrix. For example, the j-1 frame has N tracking targets, the j frame has M tracking targets, the cross-over data is recorded by using an NxM-order matrix A, and each element A (N, M) in the matrix represents the cross-over ratio of the N-1 frame and the M targets.

Assuming that the same target moves very little between adjacent frames, selecting the detection position of which the detection result is closest to the position of the tracking target in the previous frame as the position of the target in the current frame. And selecting a detection frame of the current frame corresponding to the IOU with the maximum value and greater than a certain specific threshold value as the position of the tracking target in the current frame. This threshold is typically around 0.3.

Optionally, as another embodiment, the method of fig. 1 further includes: after each tracking target in the plurality of tracking targets is associated with the candidate detection frame of the j-th frame, deleting the associated detection frame from the candidate detection frame queue; for the tracking target which is not successfully associated, if a plurality of continuous frames are not successfully associated, deleting the tracking target from the tracking target queue; and for the candidate detection frames which are not successfully associated, if the candidate detection frames continuously appear in a plurality of continuous frames, taking the candidate detection frames as new tracking targets, and adding the new tracking targets into a tracking target queue.

Specifically, for a certain video frame, a plurality of tracking targets in the current video frame can be detected by using a trained target detection network or model, and the tracking targets can be used as a candidate detection frame queue of the tracking targets. The track target queue is the sum of the targets determined by all frames preceding the video frame. After a certain tracking target in the tracking target queue is associated with a certain candidate detection frame in the frame, the detection frame associated with the tracking target is deleted from the candidate detection frame queue. The purpose of this is to reduce the detection frame that needs analysis processing later, improve tracking speed and efficiency.

For a tracking target that is not successfully associated with any of the candidate detection frames at a certain video frame, the tracking target is deleted from the tracking target queue if the tracking target is not successfully associated with any of the candidate detection frames for the next consecutive frames. The purpose of this is to reduce the number of tracking targets, reduce interference, and improve tracking speed and efficiency.

For a candidate detection frame which is successfully associated with any tracking target in a certain video frame, if the candidate detection frame appears in a plurality of next frames, the candidate detection frame is taken as a new tracking target and added into a tracking target queue. This is done because objects that appear are contacted in a small area for a short time, and objects that are to be tracked can be determined basically. The purpose of this is to make tracking more comprehensive and accurate.

According to an embodiment of the present invention, a candidate detection frame may be acquired by: labeling an application scene or a plurality of pictures close to the application scene, and training a target detector based on deep learning; a candidate detection frame of a plurality of tracking targets in any frame to be processed is acquired by using a target detector.

According to an embodiment of the present invention, a classification recognition model of a plurality of tracking targets can be established as follows. For example, for an application scene or several videos close to the application scene, frame-by-frame or interval several frame labels are employed for training a classification recognition model based on deep learning.

FIG. 2 is a schematic flow diagram of an overall process for multi-target tracking according to one embodiment of the invention.

The multi-target tracking technology can track the action tracks of a plurality of targets in a period of time, and solves the problem of sensing dynamic objects.

As described above, before step 110, the following steps are further included:

210, building and training a network model.

The network model includes a deep learning-based target detection model, such as the fast RCNN (Faster Region based Neural Convolution Network), SSD (Single Shot MultiBox Detector), YOLO (You Only Look Once), etc., deep neural network model. For example, an application scene or several pictures close to the application scene are annotated to train a deep learning based object detector.

Specifically, a picture can be input based on a target detection model of deep learning, and a characteristic diagram of the image is obtained through multi-layer convolution calculation; then designing a corresponding anchor coordinate frame according to the size distribution of the object in the image, classifying anchor positions and bounding box regression, finding the positions of the object possibly existing in the image, mapping the positions as a region of interest (ROI) onto a feature map, taking out convolution features of each position, performing a large amount of inner product calculation with trained parameters to obtain vector type features, further classifying and bounding box regression, and finally obtaining the positions and the types of the object existing in the image. The function of the convolution layer is to extract image features. After two-dimensional convolution and offset operation are carried out on an input image, a nonlinear activation function is used, and a convolution result can be obtained, namely an image characteristic is obtained.

In 210, the network model also includes a deep learning based object classification model, such as ResNet18, caffeNet or GoogleNet class classification models. And training a classification model by adopting triple loss for the complex data set with high similarity.

Constructing the training network model comprises the following steps:

And extracting the depth characteristics of the tracked targets by using a pre-trained deep neural network model, and carrying out characteristic extraction and classification training on the tracked targets with the number of samples exceeding a certain number to obtain a classification model.

Specifically, the application scene or the real video close to the application scene can be acquired for labeling. The labeling method can be that original sampling images are obtained frame by frame or at intervals of a plurality of frames, such as 1, 2 or 5 frames, different tracking targets are labeled with different ids, a database applied to the classification of the tracking target images is constructed, and the database is formed into a classification model by pictures with labels and tags.

In deep learning target classification networks, the full connection layer serves to connect all features and send output values to the classification model.

According to the embodiment of the invention, assuming that the counted tracking target class number is N classes, the output number of the last full-connection layer of the classification network is set as N, so that the classification model has the capability of distinguishing the N classes.

220, inputting video frames.

And 230, acquiring a candidate detection frame of the target in the current frame by using a target detector based on deep learning.

This step is similar to step 110 described above and will not be described in further detail herein.

And 240, carrying out data association on the detection candidate frame obtained by the current frame and the region of interest ROI (Region of Interest) of the tracking target in the previous frame to obtain a detection frame corresponding to the tracking target in the current frame.

250, determining whether a plurality of tracking targets overlap between detection frames associated with the current frame.

260, if the tracking target is not overlapped with other tracking target detection frames, the tracking result is given, namely, the data association of the tracking target between adjacent frames is carried out by utilizing the cross-correlation ratio.

270, if at least two tracking target detection frames overlap, then the corresponding at least two tracking targets are set to be in an overlapping state.

280, determining whether the mutually overlapped tracking targets cancel the overlap.

The overlapping tracked objects are not overlapped when the intersection ratio is smaller than 0.3.

And 290, reclassifying the detection frames associated with the tracking targets after the overlapping is canceled by using the classification model after the trimming, so that the tracking targets which are overlapped are associated with the correct detection frames after the overlapping, that is to say, the tracking targets which are overlapped are correctly associated with the ROI of one frame before the tracking targets which are overlapped by using the detection frames after the overlapping is canceled.

In order to improve the operation speed and make the classification effect better, the pre-trained classification model is finely tuned.

The number of output nodes of the last full connection layer is modified to N310.

At this time, it is necessary to train the classification network to learn the characteristics of each tracking target that overlap each other. The classification model needs to be classified into different categories and pre-training classification models, and the model classification number is the number of tracking targets which overlap.

The number of output nodes of the last full connection layer is modified to be N, assuming that N targets overlap each other.

320, loading a pre-trained classification model, and initializing parameters of the last full connection layer.

Modifying the number of output nodes of the last full connection layer of the classification network may result in a mismatch in the number of parameters of the last full connection layer. The parameters of the last full connection layer need to be initialized with a randomized initial value, which is independent of the pre-trained classification model, so that it can output the corresponding node number.

And loading a pre-trained classification model, and initializing parameters of the last full-connection layer to obtain a new classification model after fine adjustment.

At 330, the classification model is retrained with the ROI of the overlapping tracked object prior to the overlapping.

And extracting all corresponding ROIs from frames corresponding to the overlapping moments of at least two tracking targets which are overlapped from the beginning of tracking to the frame i, and performing a plurality of iterations and retraining on the modified classification model so that the classification model can well identify the tracking targets which are overlapped.

The tracking targets which overlap each other do not judge whether the matching is correct or not in the whole overlapping process.

According to the technical scheme provided by the embodiment of the invention, under the condition that the tracking targets are judged to be overlapped with each other, the classification model is retrained by the tracking target ROI before the pre-trained classification model is subjected to fine adjustment and is overlapped with each other; after the target is cancelled and overlapped, the corresponding detection frame is reclassified by using the finely tuned classification model, so that the tracking target is ensured to be correctly matched with the position of the tracking target when the target is cancelled and overlapped, and the accuracy and the instantaneity of multi-target tracking are ensured.

Any combination of the above-described alternative embodiments may be used to form alternative embodiments of the present invention, and will not be described in any detail herein.

The following are examples of the apparatus of the present invention that may be used to perform the method embodiments of the present invention. For details not disclosed in the embodiments of the apparatus of the present invention, please refer to the embodiments of the method of the present invention.

Fig. 4 is a block diagram of an object tracking device 400 according to an exemplary embodiment of the present invention.

As shown in fig. 4, the apparatus 400 includes: an obtaining module 410, configured to obtain candidate detection frames of the plurality of tracking targets in the j-th frame through target detection; the association module 420 is configured to associate candidate detection frames of the plurality of tracking targets in the j frame with the ROI of the plurality of tracking targets in the j-1 frame, so as to obtain detection frames of the plurality of tracking targets corresponding to each other in the j frame; a determining module 430 for determining that at least two tracking targets among the plurality of tracking targets overlap between the detection frames associated with the ith frame and cancel the overlap between the detection frames associated with the jth frame; and a classification module 440, configured to reclassify, using the classification recognition model, the detection frames associated with the j-th frame when the at least two tracking targets are cancelled, so as to obtain the detection frames of the overlapping tracking targets reclassify in the j-th frame, so that when the overlapping tracking targets are cancelled, the reclassify detection frames are correctly associated with the ROI before the overlapping.

Optionally, the apparatus 400 further comprises: a building module 405, configured to build a classification recognition model of a plurality of tracking targets; an updating module 435 for updating the classification recognition model using the ROI of the frame before the at least two tracking objects overlap when it is determined that the at least two tracking objects overlap between the detection frames associated before the i-th frame, wherein the classification module reclassifies the detection frames associated with the at least two tracking objects at the i-th frame using the updated classification recognition model.

The updating module 435 is specifically configured to: the number of output nodes of the last fully connected layer of the convolutional network of the classification recognition model is modified to be equal to the number of at least two tracking targets.

The determining module 430 is specifically configured to: determining that at least two tracking targets of the plurality of tracking targets overlap between the detection frames associated with the ith frame and cancel the overlap between the detection frames associated with the jth frame; calculating the intersection ratio IOU of a plurality of tracking targets among the detection frames associated with the j-th frame; determining that if the IOU of the plurality of tracking targets between the detection frames associated with the j-th frame is greater than a specific threshold, at least two detection frames of the plurality of tracking targets overlap; if the IOU of the plurality of tracking targets between the detection frames associated with the j-th frame is smaller than or equal to a specific threshold value, determining that the detection frames of at least two tracking targets do not overlap.

The association module 420 is specifically configured to: associating the candidate detection frames of the tracking targets in the j-th frame with the interesting region ROI of the tracking targets in the j-1 th frame; calculating the intersection ratio IOU of the ROIs determined by the tracking targets in the j-1 frame and the candidate detection frames of the tracking targets in the j frame, and taking the candidate detection frame with the largest IOU value of each tracking target in the tracking targets and larger than a specific threshold value as the detection frame of the tracking target in the j frame.

The association module 420 is further configured to: after each tracking target in the plurality of tracking targets is associated with the candidate detection frame of the j-th frame, deleting the associated detection frame from the candidate detection frame queue; for the tracking target which is not successfully associated, if a plurality of continuous frames are not successfully associated, deleting the tracking target from the tracking target queue; and for the candidate detection frames which are not successfully associated, if the candidate detection frames continuously appear in a plurality of continuous frames, taking the candidate detection frames as new tracking targets, and adding the new tracking targets into a tracking target queue.

The obtaining module 410 is specifically configured to: acquiring candidate detection frames of a plurality of tracking targets in a j-th frame through target detection; the application scene or several pictures close to the application scene are annotated to train the deep learning based object detector.

The establishing module 405 is specifically configured to: and labeling the application scene or a plurality of videos close to the application scene by adopting a plurality of frames at intervals so as to train a classification recognition model based on deep learning.

FIG. 5 is a block diagram illustrating a computer device 500 for multi-target tracking according to an exemplary embodiment of the invention.

Referring to fig. 5, apparatus 500 includes a processing component 510 that further includes one or more processors and memory resources represented by memory 520 for storing instructions, such as applications, executable by processing component 510. The application program stored in memory 520 may include one or more modules each corresponding to a set of instructions. Further, the processing component 510 is configured to execute instructions to perform the above-described method of tracking targets.

The apparatus 500 may also include a power component configured to perform power management of the apparatus 500, a wired or wireless network interface configured to connect the apparatus 500 to a network, and an input output (I/O) interface. The apparatus 500 may operate based on an operating system stored in the memory 520, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.

A non-transitory computer readable storage medium, which when executed by a processor of the apparatus 500, causes the apparatus 500 to perform a multi-objective tracking method, comprising: acquiring candidate detection frames of a plurality of tracking targets in a j-th frame through target detection; associating candidate detection frames of a plurality of tracking targets in a j-th frame with the ROI of the plurality of tracking targets in a j-1-th frame to obtain detection frames of the plurality of tracking targets respectively corresponding to the j-th frame; determining that at least two tracking targets of the plurality of tracking targets overlap between the detection frames associated with the ith frame and cancel the overlap between the detection frames associated with the jth frame; reclassifying the detection frames associated with the at least two tracking targets in the j-th frame by using a classification recognition model to obtain detection frames of the at least two tracking targets reclassifying in the j-th frame, so that the detection frames of the at least two tracking targets reclassifying in the j-th frame are associated with the ROI of the at least two tracking targets in the i-1-th frame, wherein i and j are positive integers and i < j.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It will be apparent to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and will not be described in any more detail herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program verification codes.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A multi-target tracking method, comprising:

associating candidate detection frames of the tracking targets in the j-th frame with the interested region ROI of the tracking targets in the j-1-th frame based on the cross ratio IOU to obtain detection frames of the tracking targets corresponding to the tracking targets in the j-th frame;

determining that at least two tracking targets of the plurality of tracking targets overlap between detection frames associated with an ith frame and cancel the overlap between detection frames associated with a jth frame;

reclassifying the detection frames of the at least two tracking targets associated with the jth frame by using a classification recognition model to obtain the detection frames of the at least two tracking targets reclassifying with the jth frame so that the detection frames of the at least two tracking targets reclassifying with the ROI of the at least two tracking targets in the ith-1 frame, wherein i and j are positive integers and i < j,

the method further comprises the steps of:

establishing a classification recognition model of the plurality of tracking targets;

upon determining that the at least two tracked objects overlap between the detection frames associated with the ith frame, updating the classification recognition model using the ROIs of frames of the at least two tracked objects prior to the time of overlap,

Wherein the reclassifying the detection frames associated with the at least two tracking targets at the j-th frame using the classification recognition model comprises:

2. The multi-target tracking method of claim 1 wherein the determining that at least two of the plurality of tracking targets overlap between detection frames associated with an ith frame and that overlap is canceled between detection frames associated with an jth frame comprises:

calculating the intersection ratio IOU of the tracking targets among the detection frames associated with the jth frame;

if the IOU of the plurality of tracking targets between the detection frames associated with the j-th frame is greater than a specific threshold, at least two detection frames of the plurality of tracking targets overlap,

and if the IOU of the plurality of tracking targets among the detection frames associated with the j-th frame is smaller than or equal to a specific threshold value, determining that the detection frames of the at least two plurality of tracking targets do not overlap.

3. The multi-target tracking method of claim 1 wherein the associating the candidate detection frames for the plurality of tracking targets at frame j with the region of interest ROI for the plurality of tracking targets at frame j-1 based on the cross-over IOU comprises:

4. A multi-target tracking method as defined in claim 3, further comprising:

and for the candidate detection frames which are not successfully associated, if the candidate detection frames continuously appear in a plurality of continuous frames, taking the candidate detection frames as new tracking targets, and adding the new tracking targets into the tracking target queue.

5. A multi-target tracking apparatus, comprising:

the association module is used for associating the candidate detection frames of the tracking targets in the j frame with the interested region ROI of the tracking targets in the j-1 frame based on the cross ratio IOU to obtain detection frames of the tracking targets corresponding to the tracking targets in the j frame;

A determining module for determining that at least two tracking targets of the plurality of tracking targets overlap between detection frames associated with an ith frame and cancel the overlap between detection frames associated with a jth frame;

the classification module is used for reclassifying the detection frames associated with the at least two tracking targets in the j-th frame by using a classification recognition model to obtain the detection frames of the at least two tracking targets reclassifying in the j-th frame, so that the detection frames of the at least two tracking targets reclassifying in the j-th frame are associated with the ROI of the at least two tracking targets in the i-1 th frame, wherein i and j are positive integers and i < j;

the establishing module is used for establishing a classification recognition model of the plurality of tracking targets;

an updating module for updating the classification recognition model using ROIs of frames of the at least two tracking targets before the overlap occurs when it is determined that the at least two tracking targets overlap between the detection frames associated with the ith frame,

and the classification module is also used for reclassifying the detection frames associated with the i-th frame of the at least two tracking targets by using the updated classification recognition model.

6. The multi-target tracking device of claim 5, wherein the determining module is specifically configured to:

determining that if the IOU of the plurality of tracking targets between the detection frames associated with the j-th frame is greater than a specific threshold, at least two detection frames of the plurality of tracking targets overlap; and if the IOU of the plurality of tracking targets among the detection frames associated with the j-th frame is smaller than or equal to a specific threshold value, determining that the detection frames of the at least two plurality of tracking targets do not overlap.

7. The multi-target tracking device of claim 5, wherein the association module is specifically configured to:

associating candidate detection frames of the tracking targets in the j-th frame with the interesting region ROI of the tracking targets in the j-1-th frame based on the cross ratio IOU; calculating the intersection ratio IOU of the ROIs determined by the tracking targets in the j-1 frame and the candidate detection frames of the tracking targets in the j frame, and taking the candidate detection frame with the largest IOU value of each tracking target in the tracking targets and larger than a specific threshold value as the detection frame of the tracking target in the j frame.

8. The multi-target tracking device of claim 7, wherein the association module is further configured to: